Summary
Transcript
For example, if you search for music festivals, search GPT doesn’t just list websites like Google, but it instead summarises the key details of various events, and provides brief descriptions along with links for more information. And in another demonstration, search GPT explains the best times to plant tomatoes, and offers insights into different tomato varieties. Users can also ask follow-up questions, or explore additional related links displayed in a sidebar. It also has a special feature called Visual Answers, although OpenAI has yet to elaborate on the specifics about their next edition. As for now, search GPT is still labelled as a prototype, so it’s still not near ready for prime time, but we do know it’s powered by the GPT-4 family of models, and will initially be accessible to only 10,000 test users.
OpenAI spokesperson Kayla Wood emphasised that the company is collaborating with third-party partners, and using direct content feeds to build its search results, with the ultimate goal being to integrate these search features directly into chat GPT itself. And the launch of search GPT could pose a significant challenge to establish search engines like Google and Bing, which have been rapidly incorporating AI features into their own platforms to stay ahead of OpenAI. But Google’s efforts indicate a clear concern that users might switch to new AI-driven alternatives like search GPT. Additionally, search GPT positions OpenAI in direct competition with Perplexity, a startup that markets itself slightly differently as an AI answer engine instead.
Unfortunately, Perplexity has faced criticism for an AI summaries feature that some publishers claimed was essentially plagiarising their work. In response to this, OpenAI has chosen a different approach, emphasising collaboration with various news partners, including major organisations like The Wall Street Journal, The Associated Press and Vox Media. Plus, OpenAI has been proactive in addressing other potential ethical issues with search GPT-2, with the company having developed this search engine in collaboration with a host of news partners, ensuring that publishers have a say in how their content is used. Additionally, publishers can choose to opt out of having their content used to train OpenAI’s models while still appearing in search results.
Search GPT is also designed to help users connect with publishers by prominently citing and linking to them in searches. Additionally, responses have clear in-line named attribution and links, so users know where information is coming from and can quickly engage with even more results in a sidebar with source links. As for cost, search GPT will be free during its initial launch with no ads currently in place, but OpenAI is developing a monetisation strategy. Overall, the initial release of search GPT is intended to serve as a prototype that can undergo further testing and refinement, with OpenAI aiming to expand its user base and continue to integrate the search engine into its core chat products.
And the company’s focus on ethics, collaboration with publishers, and a simple interface all positions search GPT as a potential disruption to the traditional search engine market. Meanwhile, a new method from NVIDIA’s Toronto AI lab called Simplicit may forever change how we create and interact with 3D simulations. This breakthrough revolves around Simplicit’s ability to handle a wide range of 3D representations without relying on traditional datasets or meshes, using cutting-edge neural fields instead. This means that Simplicit can simulate everything from complex geometries and point clouds to tomography scans and even neural representations, all while accurately considering physics-based material properties.
But what makes Simplicit truly remarkable is its raw versatility. For example, it can simulate an entire dataset of objects using just one object representation and an occupancy function, which determines whether any given point in 3D space is inside or outside the object. Each point within the object is managed by a combination of weights and skinning handles, much like how animations control characters in video games. On top of this, the weights across the object are represented by a continuous function which can be evaluated at any resolution. To this, a neural network is trained to approximate this function, mapping 3D points to a vector of weights for each control handle.
This training process uses a simple loss function over spatial sample points, ensuring accurate representation of the object’s interior. But one of Simplicit’s most groundbreaking features is its data-free training process. Unlike traditional methods that require extensive pre-existing data, Simplicit generates deformations on its own by sampling affine transformations. It uses advanced mathematical methods, such as Newton’s method, to solve the equations that describe how objects deform over time. These simulations are highly efficient, running at near-interactive speeds on modern GPUs like the RTX 3070. Additionally, Simplicit can even simulate realistic material behaviors using parameters like Neohookian elasticity.
This allows it to produce results that closely match the accuracy of gold-standard finite element method simulations, which are traditionally very resource-intensive. As a result, Simplicit can handle objects with diverse material properties and complex geometric structures, bringing a new level of realism to 3D simulations. And finally, NVIDIA researchers have released another AI breakthrough called Visual Fact Checker, or VFC for short, which may transform the way we caption 2D images and 3D objects. Here’s how it works. By combining multimodal technologies and large language models, VFC ensures accurate and reliable descriptions, setting new standards in the field.
Visual Fact Checker starts off by using two multimodal captioning models named Captioner 1 and Captioner 2, which generate preliminary captions for an input image. These captions are then verified by an LLM using object detection techniques. The LLM integrates all verified information and produces a final, accurate caption following specific instructions. For 3D objects, VFC generates captions for individual views using Captioner 1 and Captioner 2. Each view’s caption is fact-checked by the LLM with visual question-answering models. The LLM then compiles the results into a final caption for each view. Afterward, it synthesizes these captions into a single, comprehensive description of the entire 3D object.
Most importantly, Visual Fact Checker’s effectiveness is ensured through evaluations using four key metrics. The first of these is the clip score, which measures the similarity between the image and the generated text. Second is the clip image score to assess the similarity between the original image and a reconstructed image created from the caption. Third is the human study stage where human evaluations via Amazon Mechanical Turk are carried out, and the fourth is the use of GPT-4V for vision. In evaluations for 2D images, VFC excels on the COCO dataset, whereas for 3D objects, it demonstrates superior performance on the Obgeverse dataset.
[tr:trw].