Summary
Transcript
But this new version of the Quavo robot also now has a dramatically improved human-like gate and stride, with the robot’s movement control system now including a cross-terrain adaptive gate enabling autonomous navigation across various surfaces. And this is further complemented by a multipath bipedal walking algorithm to upgrade the robot’s mobility and stability, providing the robot with a significant improvement in naturalness over previous generations of humanoids. Additionally, the robot also demonstrated its upgraded embodied intelligence, featuring imitation learning with real-time data collection and processing. This allows Quavo to adapt and learn from its environment in real time, which is a crucial feature for handling real-world dynamic tasks.
And next, the robot also demoed its coordination of upper and lower body control, while performing tasks that require simultaneous use of its limbs, showcasing upgraded full-body task orchestration abilities. Then, along with its full-body coordination, the upgraded Quavo incorporates even greater control with a disturbance-resistant motion algorithm for increased balance and reliability even in changing environments. And as for visual processing, Quavo performed a computer vision demonstration using inverse kinematics to control the robot’s end-effectors for positioning and grasping tasks, with impressive results that indicate the robot can already perform the majority of unskilled pick and place tasks.
But what’s probably most impressive is the robot’s newfound dexterity in its hands, with each one having six degrees of freedom as well as being equipped with tactile sensors. And as for speed, these hands can open and close in 0.8 seconds, having a minimum weight recognition threshold of less than 1 gram, as well as a force accuracy of up to 0.01 newtons. Additionally, Quavo comes with tools for researchers and developers, with the robot’s software development kit supporting multiple programming languages, with further features to support speech recognition, audio detection, movement configuration, self-operating ability, and vision processing.
Plus, developers even have keyboard control for Quavo’s arms and hands manually while testing. And to get started easily, Quavo comes with a database of pre-made use cases, with over 20 demo files that developers can employ and adapt into existing projects, or as a foundation for new ones. Plus, there’s also the integration of a large language model to assist with development tasks in order to help developers troubleshoot issues, generate code snippets, and suggest optimizations for their robotics projects. And simultaneously, Google DeepMind’s VO video generation model is being merged into its Shorts feature, soon allowing users to effortlessly produce high-quality short videos at 1080p resolution.
In fact, these AI-generated videos can extend beyond one minute in duration, plus encompass a wide range of cinematic and visual styles. But one of VO’s key strengths lies in its sophisticated understanding of language and vision, as this model accurately interprets text prompts and combines the information with relevant visual references to produce coherent scenes. This capability allows for an unprecedented level of creative control, enabling users to specify various cinematic effects such as time lapses or aerial landscape shots, and the model’s filmmaking controls even extend to editing functionalities. For instance, when provided with an input video and an editing command, such as adding kayaks to an aerial coastline shot, VO can apply the command to create a new edited video.
Plus, VO also demonstrates the ability to generate and extend videos to 60 seconds and beyond, either from a single prompt or even a sequence of prompts describing a story. But Google emphasizes responsible development and deployment of its video generation technology, with videos created by VO being watermarked using Synth ID to accurately identify AI-generated content. Additionally, all output will pass through safety filters and memorization checking processes to further mitigate risks related to privacy, copyright and bias. And to compete with VO, Adobe has announced upcoming AI-powered features for its video editing software, aimed at enhancing creative workflows.
The company’s Firefly Video model will introduce several new capabilities to assist editors after having trained on public domain and licensed content. Its text-to-video functionality will allow users to generate B-roll footage using text prompts, camera controls and reference images. This tool can easily fill gaps in timelines and explore visual effects concepts. Plus, the image-to-video feature can animate still images and illustrations too. Another forthcoming tool called Generative Extend is also planned for the Premiere Pro Beta to lengthen clips, smooth transitions and adjust shot durations for precise timing. But Adobe emphasizes that these AI features are developed with consideration for creators’ rights and commercial safety, stating that user content is not used in training the AI model.
As for release, these new features are expected to be available in beta form soon, with Adobe aiming to address common editing challenges and streamline the creative process for video professionals with these AI-assisted tools. At the same time, Adobe just released its Firefly Image 3 to introduce a range of advanced features. To start, one of Image 3’s key advancements is its auto-stylization capabilities. Powered by a new style engine, this feature delivers higher-quality outputs with increased variety, giving users more control and personalization over the styles of images they generate, with outputs including new varieties of styles, colors, backgrounds, and subject poses.
This feature is intended to streamline the creative ideation process and enable faster exploration of visual concepts. But even more significant is the introduction of structure reference and style reference. Basically, structure reference allows users to generate new images that match the structure of a reference image, eliminating the need for precise prompt writing. Meanwhile, style reference offers higher-quality outputs with greater control over generated styles. And when combined, these features enable users to reference both the structure and style of an image simultaneously, facilitating rapid visualization of ideas. But Firefly Image 3 also boasts improvements in photographic quality, delivering enhanced lighting, positioning, and variety in generated images.
Notable advancements can be seen in people rendering, with more detailed features and a wider range of moods and expressions. Moreover, this tool also shows improved capabilities in generating complex structures and crowds. Plus, the new version demonstrates a better understanding of text prompts and scenes, allowing for more accurate image generations that reflect long complex prompts and include richer details. And text rendering has also been improved, resulting in clearer text displays in generated images. And even illustration and icon creation with Firefly Image 3 now also offer significant improvements, with the tool now enabling even quicker creation of icons, logos, raster images, and line art.
And finally, Xgrids has introduced its Lixl L2 Pro, a cutting-edge mobile simultaneous localization and mapping scanner, to usher in the age of real-time 3D data capture and processing. And now, the Lixl L2 Pro combines lidar, visual, and inertial measurement unit modules with advanced AI capabilities to deliver real-time point cloud data of unprecedented quality. This leap in technology now allows the L2 Pro to produce output that rivals the accuracy and detail of post-processed data from previous generations, effectively ushering in what Xgrids calls the zero post-processing era for SLAM devices. But one of the most significant advancements of the L2 Pro is its ability to generate geo-reference data in real-time with absolute accuracy up to three centimeters, thanks to its high-precision real-time kinematic capabilities.
This feature is particularly valuable for applications requiring precise spatial data, such as topographic surveying and engineering projects. In fact, the device boasts a 300-meter scan range, a point capture rate of 640,000 points per second, and the ability to achieve one-millimeter point cloud spacing. And with a relative accuracy of one centimeter and a real-time absolute accuracy of three centimeters, the L2 Pro sets a new standard for mobile scanning devices. Additionally, Xgrids has also introduced the Lixl Up sample algorithm, which further enhances the quality of the point cloud data. This algorithm increases point cloud density to one million points per square meter with one millimeter of spacing, capturing minute details with exceptional precision.
Impressively, the algorithm achieves a five-millimeter point cloud thickness for even more accurate mapping and line drawing. Furthermore, it has an exclusive multi-SLAM algorithm to enhance adaptability and reliability in challenging scenarios, such as indoor spaces with limited satellite signals or degraded environments like subways and tunnels. And to tie it all together, Xgrids has developed a comprehensive software ecosystem with the Lixl Go companion app to offer easy control and real-time viewing of camera images, elevation information, and true color point clouds, with this kind of instant feedback helping users avoid missing or incorrect data collection during scanning operations.
And finally, for post-capture processing, the Lixl Studio software provides a suite of tools for viewing, editing, and processing point cloud data. Powered by spatial intelligence algorithms, it includes features such as one-click processing, panoramic overlay, map fusion, and advanced editing tools, with the software also providing industry-specific plug-ins to cater to applications in construction, agriculture, forestry, and film production. [tr:trw].