Running AI models often feels resource-intensive and limited to expensive hardware like GPUs. My talk demonstrates how you can use various Python libraries to unlock the potential of AI on CPUs. Optimize costs, scale AI across diverse devices, manage efficient AI pipelines from local PCs to Cloud.
In my talk I’ll demonstrate how Python developers can efficiently run advanced AI models (i.e. Qwen Chat and Vision-Language models) on CPUs, using lightweight and accessible libraries. This approach is perfect for individual developers running personal AI projects on a budget, as well as enterprises aiming to scale AI pipelines efficiently and cost-effectively. I’ll show how to efficiently deploy state-of-the-art tools for conversational AI and multimodal understanding directly on CPU. Whether you’re tinkering with AI at home or deploying applications at scale, this method offers a practical, accessible, and resource-friendly solution.
I’m a passionate software engineer currently serving as the Technical Director of the R&D AI Unit “MadnessLab” at Zucchetti Centro Sistemi. With a career spanning roles as a backend and DevSecOps engineer, I’ve gained expertise in a wide range of tools and programming languages. Today, I’m focusing on designing and deploying efficient Java, Angular, and Python applications, with a particular emphasis on building AI solutions optimized for robust and scalable cloud deployment.