Can robots learn from machine dreams?
For roboticists, one challenge towers above all others: generalization — the ability to create machines that can adapt to any environment or condition. Since the 1970s, the field has evolved from writing sophisticated programs to using deep learning, teaching robots to learn directly from human behavior. But a critical bottleneck remains: data quality. To improve, robots need to encounter scenarios that push the boundaries of their capabilities, operating at the edge of their mastery. This process traditionally requires human oversight, with operators carefully challenging robots to expand their abilities. As robots become more sophisticated, this hands-on approach hits a scaling problem: the demand for high-quality training data far outpaces humans’ ability to provide it. Now, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers has developed a novel approach to robot training that could significantly accelerate the deployment of adaptable, intelligent machines in real-world environments. The new system, called “LucidSim,” uses recent advances in generative AI and physics simulators to create diverse and realistic virtual training environments, helping robots achieve expert-level performance in difficult tasks without any real-world data. LucidSim combines physics simulation with generative AI models, addressing one of the most persistent challenges in robotics: transferring skills learned in simulation to the real world. “A fundamental challenge in robot learning has long been the ‘sim-to-real gap’ — the disparity between simulated training environments and the complex, unpredictable real world,” says MIT CSAIL postdoc Ge Yang, a lead researcher on LucidSim. “Previous approaches often relied on depth sensors, which simplified the problem but missed crucial real-world complexities.” The multipronged system is a blend of different technologies. At its core, LucidSim uses large language models to generate various structured descriptions of environments. These descriptions are then transformed into images using generative models. To ensure that these images reflect real-world physics, an underlying physics simulator is used to guide the generation process. The birth of an idea: From burritos to breakthroughs The inspiration for LucidSim came from an unexpected place: a conversation outside Beantown Taqueria in Cambridge, Massachusetts. “We wanted to teach vision-equipped robots how to improve using human feedback. But then, we realized we didn’t have a pure vision-based policy to begin with,” says Alan Yu, an undergraduate student in electrical engineering and computer science (EECS) at MIT and co-lead author on LucidSim. “We kept talking about it as we walked down the street, and then we stopped outside the taqueria for about half-an-hour. That’s where we had our moment.” To cook up their data, the team generated realistic images by extracting depth maps, which provide geometric information, and semantic masks, which label different parts of an image, from the simulated scene. They quickly realized, however, that with tight control on the composition of the image content, the model would produce similar images that weren’t different from each other using the same prompt. So, they devised a way to source diverse text prompts from ChatGPT. This approach, however, only resulted in a single image. To make short, coherent videos that serve as little “experiences” for the robot, the scientists hacked together some image magic into another novel technique the team created, called “Dreams In Motion.” The system computes the movements of each pixel between frames, to warp a single generated image into a short, multi-frame video. Dreams In Motion does this by considering the 3D geometry of the scene and the relative changes in the robot’s perspective. “We outperform domain randomization, a method developed in 2017 that applies random colors and patterns to objects in the environment, which is still considered the go-to method these days,” says Yu. “While this technique generates diverse data, it lacks realism. LucidSim addresses both diversity and realism problems. It’s exciting that even without seeing the real world during training, the robot can recognize and navigate obstacles in real environments.” The team is particularly excited about the potential of applying LucidSim to domains outside quadruped locomotion and parkour, their main test bed. One example is mobile manipulation, where a mobile robot is tasked to handle objects in an open area; also, color perception is critical. “Today, these robots still learn from real-world demonstrations,” says Yang. “Although collecting demonstrations is easy, scaling a real-world robot teleoperation setup to thousands of skills is challenging because a human has to physically set up each scene. We hope to make this easier, thus qualitatively more scalable, by moving data collection into a virtual environment.” Who’s the real expert? The team put LucidSim to the test against an alternative, where an expert teacher demonstrates the skill for the robot to learn from. The results were surprising: Robots trained by the expert struggled, succeeding only 15 percent of the time — and even quadrupling the amount of expert training data barely moved the needle. But when robots collected their own training data through LucidSim, the story changed dramatically. Just doubling the dataset size catapulted success rates to 88 percent. “And giving our robot more data monotonically improves its performance — eventually, the student becomes the expert,” says Yang. “One of the main challenges in sim-to-real transfer for robotics is achieving visual realism in simulated environments,” says Stanford University assistant professor of electrical engineering Shuran Song, who wasn’t involved in the research. “The LucidSim framework provides an elegant solution by using generative models to create diverse, highly realistic visual data for any simulation. This work could significantly accelerate the deployment of robots trained in virtual environments to real-world tasks.” From the streets of Cambridge to the cutting edge of robotics research, LucidSim is paving the way toward a new generation of intelligent, adaptable machines — ones that learn to navigate our complex world without ever setting foot in it. Yu and Yang wrote the paper with four fellow CSAIL affiliates: Ran Choi, an MIT postdoc in mechanical engineering; Yajvan Ravan, an MIT undergraduate in EECS; John Leonard, the Samuel C. Collins Professor of Mechanical and Ocean Engineering in the MIT Department of Mechanical Engineering; and Phillip Isola, an MIT associate professor
A model of virtuosity
A crowd gathered at the MIT Media Lab in September for a concert by musician Jordan Rudess and two collaborators. One of them, violinist and vocalist Camilla Bäckman, has performed with Rudess before. The other — an artificial intelligence model informally dubbed the jam_bot, which Rudess developed with an MIT team over the preceding several months — was making its public debut as a work in progress. Throughout the show, Rudess and Bäckman exchanged the signals and smiles of experienced musicians finding a groove together. Rudess’ interactions with the jam_bot suggested a different and unfamiliar kind of exchange. During one duet inspired by Bach, Rudess alternated between playing a few measures and allowing the AI to continue the music in a similar baroque style. Each time the model took its turn, a range of expressions moved across Rudess’ face: bemusement, concentration, curiosity. At the end of the piece, Rudess admitted to the audience, “That is a combination of a whole lot of fun and really, really challenging.” Rudess is an acclaimed keyboardist — the best of all time, according to one Music Radar magazine poll — known for his work with the platinum-selling, Grammy-winning progressive metal band Dream Theater, which embarks this fall on a 40th anniversary tour. He is also a solo artist whose latest album, “Permission to Fly,” was released on Sept. 6; an educator who shares his skills through detailed online tutorials; and the founder of software company Wizdom Music. His work combines a rigorous classical foundation (he began his piano studies at The Juilliard School at age 9) with a genius for improvisation and an appetite for experimentation. Last spring, Rudess became a visiting artist with the MIT Center for Art, Science and Technology (CAST), collaborating with the MIT Media Lab’s Responsive Environments research group on the creation of new AI-powered music technology. Rudess’ main collaborators in the enterprise are Media Lab graduate students Lancelot Blanchard, who researches musical applications of generative AI (informed by his own studies in classical piano), and Perry Naseck, an artist and engineer specializing in interactive, kinetic, light- and time-based media. Overseeing the project is Professor Joseph Paradiso, head of the Responsive Environments group and a longtime Rudess fan. Paradiso arrived at the Media Lab in 1994 with a CV in physics and engineering and a sideline designing and building synthesizers to explore his avant-garde musical tastes. His group has a tradition of investigating musical frontiers through novel user interfaces, sensor networks, and unconventional datasets. The researchers set out to develop a machine learning model channeling Rudess’ distinctive musical style and technique. In a paper published online by MIT Press in September, co-authored with MIT music technology professor Eran Egozy, they articulate their vision for what they call “symbiotic virtuosity:” for human and computer to duet in real-time, learning from each duet they perform together, and making performance-worthy new music in front of a live audience. Rudess contributed the data on which Blanchard trained the AI model. Rudess also provided continuous testing and feedback, while Naseck experimented with ways of visualizing the technology for the audience. “Audiences are used to seeing lighting, graphics, and scenic elements at many concerts, so we needed a platform to allow the AI to build its own relationship with the audience,” Naseck says. In early demos, this took the form of a sculptural installation with illumination that shifted each time the AI changed chords. During the concert on Sept. 21, a grid of petal-shaped panels mounted behind Rudess came to life through choreography based on the activity and future generation of the AI model. “If you see jazz musicians make eye contact and nod at each other, that gives anticipation to the audience of what’s going to happen,” says Naseck. “The AI is effectively generating sheet music and then playing it. How do we show what’s coming next and communicate that?” Naseck designed and programmed the structure from scratch at the Media Lab with assistance from Brian Mayton (mechanical design) and Carlo Mandolini (fabrication), drawing some of its movements from an experimental machine learning model developed by visiting student Madhav Lavakare that maps music to points moving in space. With the ability to spin and tilt its petals at speeds ranging from subtle to dramatic, the kinetic sculpture distinguished the AI’s contributions during the concert from those of the human performers, while conveying the emotion and energy of its output: swaying gently when Rudess took the lead, for example, or furling and unfurling like a blossom as the AI model generated stately chords for an improvised adagio. The latter was one of Naseck’s favorite moments of the show. “At the end, Jordan and Camilla left the stage and allowed the AI to fully explore its own direction,” he recalls. “The sculpture made this moment very powerful — it allowed the stage to remain animated and intensified the grandiose nature of the chords the AI played. The audience was clearly captivated by this part, sitting at the edges of their seats.” “The goal is to create a musical visual experience,” says Rudess, “to show what’s possible and to up the game.” Musical futures As the starting point for his model, Blanchard used a music transformer, an open-source neural network architecture developed by MIT Assistant Professor Anna Huang SM ’08, who joined the MIT faculty in September. “Music transformers work in a similar way as large language models,” Blanchard explains. “The same way that ChatGPT would generate the most probable next word, the model we have would predict the most probable next notes.” Blanchard fine-tuned the model using Rudess’ own playing of elements from bass lines to chords to melodies, variations of which Rudess recorded in his New York studio. Along the way, Blanchard ensured the AI would be nimble enough to respond in real-time to Rudess’ improvisations. “We reframed the project,” says Blanchard, “in terms of musical futures that were hypothesized by the model and that were only being realized at the moment based on what Jordan was deciding.” As
Bold, Confident, and Timeless: Introducing SKIMS x Dolce & Gabbana
Kim Kardashian’s shapewear label, SKIMS, has unveiled an exciting collaboration with the iconic Italian luxury brand Dolce & Gabbana. This limited-edition collection, launching on November 19, 2024, promises to merge the sophistication of Italian design with the unparalleled comfort and functionality of SKIMS shapewear. The SKIMS x Dolce & Gabbana collection is a celebration of the female silhouette, embodying sensuality, confidence, and effortlessness. By combining Dolce & Gabbana’s print-forward, couture-inspired aesthetics with SKIMS’ expertise in comfort and fit, this collaboration creates a wardrobe designed to inspire and empower. With the holiday season approaching, the collection arrives just in time for Thanksgiving and Christmas shoppers looking for the perfect gift or indulgent treat. From silk lingerie to sculpting bodysuits, these pieces elevate everyday wear with bold yet refined touches. Related Podcast New: Skims X Dolce & Gabbana Kim Kardashian’s shapewear brand, SKIMS, has partnered with Italian luxury fashion brand Dolce & Gabbana to create a limited-edition collection of clothing and underwear that combines the luxury and Italian design of Dolce & Gabbana with the comfort and functionality of SKIMS. Kim Kardashian offered fans a glimpse of the collaboration through her Instagram, showcasing an array of luxurious pieces: Classic Two-Piece Sets in pristine white. Leopard Print Bodysuits that exude boldness and flair. Sheer Black Mesh Corsets paired with over-the-knee stockings. Sculpting Leggings and Solid Black Bodysuits for a sleek silhouette. Statement D&G accents, including scarves, bandanas, and jewelry. Each item is adorned with Dolce & Gabbana’s signature elegance and SKIMS’ iconic logo, creating a distinct look that effortlessly transitions from day to night. Shop at Skims > Shop at Dolce & Gabbana >
Donald Trump Taps Dr. Oz to Head U.S. Medicaid & Medicare
President-elect Donald Trump has nominated Dr. Mehmet Oz, a well-known television personality and surgeon, to lead the Centers for Medicare and Medicaid Services (CMS). Oz, who will oversee programs impacting millions of Americans, is known for his media presence and health advocacy. However, some have criticized his promotion of treatments lacking scientific support. Trump’s selection of Oz highlights his unconventional approach to leadership, blending celebrity influence with healthcare reform goals. Related Podcast Trump Taps Dr. Oz for Medicaid & Medicare President-elect Donald Trump has nominated Dr. Mehmet Oz, a well-known television personality and surgeon, to lead the Centers for Medicare and Medicaid Services (CMS).