Multi-Modal Model Architectures: Integrating Vision and Language | Mahamudul Hasan Rubel