GPT的开源替代品,可用于复现GPT系列的各种语言模型,也包括GPT-3。
![]() |
1 day ago | |
---|---|---|
.github | 1 week ago | |
configs | 4 days ago | |
eval_tasks | 1 month ago | |
megatron | 4 days ago | |
requirements | 1 week ago | |
tests | 1 week ago | |
tools | 1 week ago | |
.clang-format | 1 year ago | |
.dockerignore | 4 months ago | |
.gitignore | 1 year ago | |
.pre-commit-config.yaml | 3 months ago | |
CITATION.cff | 10 months ago | |
CODEOWNERS | 2 years ago | |
Dockerfile | 1 week ago | |
LICENSE | 1 year ago | |
MANIFEST.in | 2 years ago | |
README-MUP.md | 1 week ago | |
README.md | 1 day ago | |
deepy.py | 1 week ago | |
evaluate.py | 3 months ago | |
generate.py | 2 months ago | |
prepare_data.py | 1 week ago | |
train.py | 3 months ago |
# mup
"use-mup": true,
"save-base-shapes": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank
"base-shapes-file": "base-shapes", # load base shapes from this file
"coord-check": false, # generate coord check plots to verify mup's implementation in neox
# mup hp search
"mup-init-scale": 1.0,
"mup-attn-temp": 1.0,
"mup-output-temp": 1.0,
"mup-embedding-mult": 1.0,
"mup-rp-embedding-mult": 1.0,
The values under mup hp search
were added and correspond to appendix F.4 from https://arxiv.org/pdf/2203.03466.pdf. These and LR are tuned with a random search using the scaled-up config (tested with 6-7B.yml) but with hidden-size set to the value from the scaled-down config (125M.yml).
With the best LR set and the best mup HPs set, revert the value of hidden-size in the scaled-up config and run again.