Summary
Transcript
But DH robotics also just unveiled its latest DH5 dextrous hand, designed with 6 degrees of freedom, 11 joints, and a lifting capacity of 5 kilograms. In fact, each finger delivers 10 newton meters of force, while the four fingers feature bending angles of 80 and 90 degrees, and the thumb offers bending angles of 30 and 15 degrees along with a 70 degree lateral swing. Additionally, the hand achieves position repeatability with a precision of 0.2 millimeters and completes actions in just 1.5 seconds. And for even more precision, a force sensor can also be added to the fingertip to provide the robot with real-time feedback on its gripping force, all powered by a standard 24-volt direct current system.
It even features a self-developed microelectric cylinder to integrate the hand’s drive and control systems for optimal performance in a cost-effective form factor. And at the same time, MIT researchers just unveiled the iSight hand, a groundbreaking humanoid hand with 7 degrees of freedom and integrated vision-based tactile sensors designed to improve whole-hand manipulation. Importantly, the system employs a novel actuation method based on quasi-direct drive actuation in order to deliver human-like strength and speed while maintaining durability for large-scale data collection. And to prove it, the iSight hand was tested on three demanding tasks, bottle opening, plasticine cutting, and plate pick and place, which require advanced manipulation, tool use, and precise force control.
By using imitation learning models trained on these tasks, along with a unique vision dropout strategy, the system demonstrated how tactile feedback enhances task success rates, with the results highlighting that tactile sensing significantly boosts performance while emphasizing the importance of tactile information for dexterous manipulation. And as for generative AI, OpenAI just previewed several next-generation Sora video creation tools as part of its new composer, enabling users to describe their vision while adjusting settings like style, aspect ratio, resolution, duration, and clip variations. Once generated, users can even preview clips as well as access several even more interesting features, such as loop, which lets users seamlessly repeat any section of a video by simply selecting a clip.
Users can also adjust their start and end frames by using the handles in the editor to create a perfect loop. Then there’s blend, which allows users to merge two videos to create something new by choosing two clips to blend. In fact, users can even control each video’s influence over time, adjusting trims or settings to refine the result. And re-cut lets you trim or extend any video. For example, here the clip was trimmed to a specific segment while leaving a space for Sora to extend it in order to create a seamless five-second shot of the robot on the hillside.
And finally, there’s remix, allowing users to edit videos with natural language to seamlessly transform video while keeping the setting and motion intact. And finally, Google has just released Gemini 2 Flash, its new version with enhanced performance and speeds that surpass even 1.5 Pro on critical benchmarks with twice the velocity. What’s more is that Gemini 2 Flash introduces new capabilities, including support for multimodal inputs such as images, video, and audio, as well as multimodal outputs like natively generated images integrated with text and steerable text to speech in multiple languages. It also seamlessly integrates with tools like Google search, code execution systems, and user-defined third-party functions, enabling a versatile range of applications.
Furthermore, Google also announced that users of the Gemini app can now experience a chat-optimized version of Gemini 2 Flash by selecting it from the model drop-down menu on desktop and mobile web platforms. And by early next year, Gemini 2.0 will be on the Google app as well as extend its reach to even more Google products. And as for what it can do, Gemini 2 Flash introduces a range of new capabilities, including user interface interactions, multimodal reasoning, long-context comprehension, complex instruction following, advanced planning, compositional function calling, and seamless integration with native tools, with these enhancements cumulatively delivering unmatched functionality, enabling AI agents to effectively assist with intricate tasks across both digital and physical domains.
The first example of this is Project Astra, which is being developed as a kind of universal AI assistant. Built with Gemini 2, the latest version of Astra boasts improved dialogue capabilities, supporting multiple and mixed languages with better comprehension of accents and uncommon words. It also introduces new tools like Google Search, Lens, and Maps to enhance its everyday usefulness. Additionally, Astra now offers in-session memory for up to 10 minutes and remembers past conversations, providing a more personalized experience. And with improved latency, Astra operates with near-human conversational speed. And in the near future, Google also plans to bring Astra’s capabilities to more Google products, including the Gemini app, as well as extend its functionality to form factors like smart glasses.
And next there’s Project Mariner, an early-stage research prototype exploring the potential of AI agents within a browser. Leveraging Gemini 2, Mariner understands and reasons across browser elements like text, code, and images. And by using an experimental Chrome extension, it performs tasks by interacting with web content, with initial evaluations showing promising results as Mariner achieves a state-of-the-art score on the Web Voyager benchmark. And despite being in its infancy, Mariner is already demonstrating the potential for agents to navigate browsers autonomously. As for safety, current safeguards ensure responsible use, such as requiring user confirmation for sensitive actions and limiting interactions to the active browser tab.
And finally there’s Jules, an experimental coding assistant powered by Gemini 2 that’s designed to assist developers directly within their GitHub workflows. It even tackles coding tasks by creating plans and executing them under developer supervision. But Google’s work with Gemini 2 extends beyond traditional applications, delving into gaming and robotics, too. For instance, Genie 2, a model capable of generating endless 3D worlds, demonstrates how AI can enrich gaming experiences. Similarly, agents built with Gemini 2 are being tested with leading developers to interpret game rules and provide real-time suggestions. Anyways, like and subscribe for more AI news and watch these bonus clips.
Thanks for watching!
[tr:trw].