To exhibit intelligence in the physical world, both AI agents and humans must comprehend and then reason about sound (including speech, non-speech sounds, and music). However, research in complex reasoning with audio has lagged behind modalities such as language and vision. This discrepancy is due to several challenges, the capabilities of algorithms for audio understanding, scarcity of large-scale training datasets, architectures, and, the lack of comprehensive benchmarks for assessing advanced audio processing capabilities. The recent open-source MMAU benchmark has revealed that even state-of-the-art LALMs, including proprietary ones, achieve only 53% accuracy on complex audio reasoning tasks. This deficiency represents a crucial bottleneck in the development of multimodal AI systems and the progression toward AGI. We are embarking on an intensive project to address critical limitations in Foundational Large Audio Language Models (LALMs). Our workshop is focused on advancing expert-level understanding and complex reasoning in audio-language models. The team, drawn from several universities and industry in the US, Europe and Asia, and with students and senior professionals from various disciplines, will allow us to achieve these goals.