Haotian Liu, a fifth-year University of Wisconsin Ph.D. student, alongside his Ph.D. advisor, Young Jae Lee, has developed LLaVA, an artificial intelligence software.
As open artificial intelligence software has peaked public interest, the race develop new AI software continues. Researchers and students have begun to challenge AI models, developing models that can be faster, more powerful and much more accessible.
Liu said he began developing LLaVA in March 2023, just when open AI software like ChatGPT emerged. Liu said the principal difference between the two software programs is that LLaVA has visual processing capabilities.
“LLaVA demonstrates lots of very fascinating capabilities like that of a language model, where it can not only chat with human language with text, but it can also see the visual world and do complex reasoning,” Liu said.
Liu said LLaVA could understand humor and unusual elements in images. Liu also expressed that he would like to see LLaVA have capabilities to assist those who are visually impaired.
Working in academia, Liu uses resources from his lab that do not scale to the magnitude of what technology companies have. Liu specifically referenced graphics processing unit capabilities, where tech companies can use hundreds of millions of data. But, in developing LLaVA, limited GPUs have not stopped him from continuously updating and optimizing LLaVA.
Caribbean Student Association will connect existing community
“One motivation for me to do this is that companies with hundreds of GPUs can do so much,” Liu said. “We have researchers and talented students from the university that can also do something with the resources we have and can even outperform them.”
He says people can reproduce artificial intelligence systems with their available resources and students can participate in such computation with the open source community to make the race in AI more exciting.
Liu plans on graduating next semester and wants to continue working on LLaVA.
“Right now, LLaVA can only process a single image at very low resolution,” Liu said. “So if you have a very large and complex scene, LLaVA cannot understand every detail.”
Liu also aims to expand LLavA’s capabilities to process videos since it currently operates statistically and can only analyze one image at a time. Additionally, he wants LLaVA to improve sourcing truthful information, whereas other AI systems may confidently provide incorrect information.
“We have an algorithm that can see the world and reason about the world,” Liu said. “There will be a lot of opportunities and potential advances we can make. I am excited about improving LLaVA’s capabilities.”