Recognizing that a model's knowledge acquisition and output generation parts are formed with some degree of separation, this approach intentionally induces such a structure.
Detailed TBD
Using https://huggingface.co/yanolja/EEVE-Korean-10.8B-v1.0 as the base model, an Instruct model was created by training layers 46 and 47 out of the 0-47 layers.
Subsequently, raw data training was conducted on layers 0-45 of the base model.
Finally, layers 0-45 from the raw data training were merged with layers 46-47 from the Instruct model.
Good ^^
chat 모델에 completion(raw) 데이터를 어떻게 학습 할 수 있을까? - Ai 언어모델 로컬 채널
https://huggingface.co/maywell/EEVE-Korean-Instruct-10.8B-Var
https://huggingface.co/maywell/EEVE-Korean-Instruct-10.8B-Var-Shakespeare
https://huggingface.co/maywell/EEVE-Korean-Instruct-10.8B-Var-RawShakespeare
Distributions observed in existing models were analyzed to confirm the hypothesis.