Idea

Recognizing that a model's knowledge acquisition and output generation parts are formed with some degree of separation, this approach intentionally induces such a structure.

Methodology

Detailed TBD

Things I’ve done #1

Using https://huggingface.co/yanolja/EEVE-Korean-10.8B-v1.0 as the base model, an Instruct model was created by training layers 46 and 47 out of the 0-47 layers.

Subsequently, raw data training was conducted on layers 0-45 of the base model.

Finally, layers 0-45 from the raw data training were merged with layers 46-47 from the Instruct model.

Result

Good ^^

chat 모델에 completion(raw) 데이터를 어떻게 학습 할 수 있을까? - Ai 언어모델 로컬 채널

Weight

https://huggingface.co/maywell/EEVE-Korean-Instruct-10.8B-Var

https://huggingface.co/maywell/EEVE-Korean-Instruct-10.8B-Var-Shakespeare

https://huggingface.co/maywell/EEVE-Korean-Instruct-10.8B-Var-RawShakespeare

Things I’ve done #2

Diff Visualization

Distributions observed in existing models were analyzed to confirm the hypothesis.