
Tokyo Tech, Tohoku University, Fujitsu, and RIKEN started collaborating
The Tokyo Institute of Technology (Tokyo Tech), Tohoku University, Fujitsu Limited, and RIKEN announced today that they will begin research and development on the Big Language Model (LLM) distributed training (1)on the Fugaku supercomputer in May 2023, within the scope of Japan’s policy-defined Fugaku use initiative.
The Tokyo Institute of Technology (Tokyo Tech), Tohoku University, Fujitsu Limited, and RIKEN announced today that they will begin research and development on the Big Language Model (LLM) distributed training (1)on the Fugaku supercomputer in May 2023, within the scope of Japan’s policy-defined Fugaku use initiative.
LLM is an AI model for deep learning that serves as the core of generative AI including ChatGPT(2). The four organizations aim to improve the environment to create LLMs that can be widely used by academics and companies, contribute to enhancing AI research capabilities in Japan, and increase the value of using Fugaku in both academia and industry by disclosing research results. This R&D is in the future.
Background
While many anticipate that LLM and generative AI will play a fundamental role as technology research and development for security, the economy, and society as a whole, the advancement and refinement of these models will require high-performance computing resources that can process big data efficiently. amount of data.
Tokyo Tech, Tohoku University, Fujitsu, and RIKEN are undertaking an initiative to this end which will focus on research and development towards LLM distributed training.
Implementation period
From 24 May 2023 to 31 March 2024 *Period of initiative to use Fugaku for Japanese policy
The role of each organization and company
The technology used in this initiative will enable organizations to efficiently perform large-scale language model training on the large-scale parallel computing environment of the Fugaku supercomputer. The roles of each organization and company are as follows:
- Tokyo Institute of Technology: Overall process supervision, parallelization and acceleration of the LLM
- Tohoku University: Learning data collection, model selection
- Fujitsu: LLM Acceleration
- RIKEN: Distributed parallelization and LLM communication acceleration, LLM acceleration
Future plan
In order to support Japanese researchers and engineers in developing LLMs in the future, the four organizations plan to publish research results obtained through the scope of the Japanese policy-defined use of Fugaku initiative on GitHub.(3) and Face Hugging(4) in fiscal year 2024. It is also anticipated that many researchers and engineers will participate in the improvement of basic models and new applied research to create efficient methods that lead to innovative research and next generation business results.
The four organizations will also consider collaborating with the University of Nagoya, which develops data generation and learning methods for multimodal applications in industries such as manufacturing, and CyberAgent, Inc., which provides data and technology to build LLMs.
Comment
Comments from Toshio Endo, Professor, Center for Global Scientific Information and Computing, Tokyo Institute of Technology:
“This collaboration will integrate the parallelization and acceleration of large-scale language models using the “Fugaku” supercomputer by Tokyo Tech and RIKEN, Fujitsu’s high-performance computing infrastructure software development for Fugaku and performance enhancement of AI models, and Tohoku University’s natural language processing technology. In collaboration with Fujitsu, we will also utilize the small research laboratory we established as “Fujitsu Collaborative Research Center for Next Generation Computing Infrastructure” in 202X. We look forward to working with our colleagues to contribute to the enhancement of Japan’s AI research capabilities, taking advantage of the large-scale distributed deep learning capabilities offered by “Fugaku”.
Comments from Kentaro Inui, Professor, Graduate School of Information Science, Tohoku University:
“We aim to build a large-scale language model that is open source, available for commercial use, and primarily based on Japanese data, with transparency in its training data. By enabling the traceability of learning data, we anticipate that it will facilitate research that is robust enough to scientifically verify issues regarding black boxes, bias, misinformation, and the so-called “hallucinatory” phenomena that are common in AI. Leveraging the insights gained from the deep learning of Japanese natural language processing developed at Tohoku University, we will build a large-scale model. We look forward to contributing to the enhancement of AI research capabilities in our country and beyond, sharing the research results we have obtained through our initiatives for researchers and developers.”
Comments from Seishi Okamoto, EVP, Head of Research Fujitsu, Fujitsu Limited:
“We are excited about the opportunity to leverage the powerful parallel computing resources of the Fugaku supercomputer to enhance AI research and advance LLMS research and development. Going forward, we aim to incorporate the results of this research into Fujitsu’s new AI Platform, codenamed “Kozuchi”, to deliver paradigm-shifting applications that contribute to the creation of a sustainable society.”
Comment from Satoshi Matsuoka, Director, RIKEN Center for Computational Science:
“A64FX(5) The CPU is equipped with an AI accelerated function known as SVE.
However, software development and optimization is critical to maximizing its capabilities and leveraging them for AI applications. We feel that this joint research will play an important role in bringing together LLM and computer science experts in Japan, including RIKEN R-CCS researchers and engineers, to advance LLM building techniques on the “Fugaku” supercomputer. Together with our collaborators, we are contributing to making Society 5.0 a reality.”
Project name
Big Language Model Distributed Training in Fugaku (Project Number: hp230254)
(Provision)
(1) Large-scale language models : Neural networks with hundreds of millions to billions of pre-learned parameters using large amounts of data. Recently, GPT in language processing and ViT in image processing are recognized as representative large-scale learning models.
(2) ChatGPT : A large-scale language model for natural language processing developed by OpenAI that supports tasks such as interactive systems and automatic sentence generation with high accuracy.
(3) GitHub : A platform used to publish open source software worldwide. GitHub
(4) Hugging Face : A platform used to publish AI datasets around the world. Hug Face
(5) A64FX : ARM-based CPU developed by Fujitsu installed in the Fugaku supercomputer.
###
About the Tokyo Institute of Technology
Tokyo Tech stands at the forefront of research and higher education as the leading university for science and technology in Japan. Tokyo Tech researchers excel in fields ranging from materials science to biology, computer science, and physics. Founded in 1881, Tokyo Tech hosts over 10,000 undergraduate and graduate students annually, who develop into scientific leaders and some of the most sought-after engineers in the industry. Embodying the Japanese philosophy of “monotsukuri,” which means “technical ingenuity and innovation,” the Tokyo Tech community strives to contribute to society through high-impact research. https://www.titech.ac.jp/english/