GPT的开源替代品,可用于复现GPT系列的各种语言模型,也包括GPT-3。

He Li 5c06ec1b98 Fix README link (#850) 1 day ago
.github 57b46b0066 Added wandb installation command to cpu_ci.yml 1 week ago
configs e2d8a24958 Move conditional tiktoken import to the __init__ func (#842) 4 days ago
eval_tasks 26ef16d733 Default `use_cache=False` in eval harness integration (#774) 1 month ago
megatron e2d8a24958 Move conditional tiktoken import to the __init__ func (#842) 4 days ago
requirements 3a4af67c58 Formatting Cleanup (#832) 1 week ago
tests f972e2502d Updated some lingering references to old configs (#830) 1 week ago
tools 3a4af67c58 Formatting Cleanup (#832) 1 week ago
.clang-format d7af1e7a8e Add black code formatting to CI (#530) 1 year ago
.dockerignore 028df0a23f Docker tweaks (#716) 4 months ago
.gitignore d7af1e7a8e Add black code formatting to CI (#530) 1 year ago
.pre-commit-config.yaml efd5911df7 Add support for Flash attention (#725) 3 months ago
CITATION.cff 5683d95c29 Update CITATION.cff 10 months ago
CODEOWNERS e1690289df Update CODEOWNERS 2 years ago
Dockerfile 1ab177a942 Updated Dockerfile to copy requirements-flashattention.txt into build context 1 week ago
LICENSE d7af1e7a8e Add black code formatting to CI (#530) 1 year ago
MANIFEST.in f29b390d39 initial commit 2 years ago
README-MUP.md f972e2502d Updated some lingering references to old configs (#830) 1 week ago
README.md 5c06ec1b98 Fix README link (#850) 1 day ago
deepy.py 9610391ab3 Simplify and relax dependencies (Take 2) (#818) 1 week ago
evaluate.py ebd47f6e9b Slurm Fix and formatting (#729) 3 months ago
generate.py 93f4efdf3e Enable multiline prompts by setting a custom prompt end (#754) 2 months ago
prepare_data.py 2ed00e6728 Change default dataset from enron to enwik8 (#833) 1 week ago
train.py 12045f2bc7 Update train.py 3 months ago

README-MUP.md

How to use Mup (https://github.com/microsoft/mup)

Add mup neox args to your config

# mup

"use-mup": true,

"save-base-shapes": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank

"base-shapes-file": "base-shapes", # load base shapes from this file

"coord-check": false, # generate coord check plots to verify mup's implementation in neox

# mup hp search

"mup-init-scale": 1.0,

"mup-attn-temp": 1.0,

"mup-output-temp": 1.0,

"mup-embedding-mult": 1.0,

"mup-rp-embedding-mult": 1.0,

Generate base shapes

  1. Set use-mup to true
  2. Set save-base-shapes to true
  3. Run once. gpt-neox will instantiate a base model and a delta model, then save one file per rank named .. gpt-neox will exit immediately.
  4. Set save-base-shapes to false
  5. Generate coord check plots (optional)

    1. Keep use-mup true
    2. Set coord-check to true
    3. Run once. gpt-neox will output jpg images similar to https://github.com/microsoft/mutransformers/blob/main/README.md#coord-check. gpt-neox will exit immediately
    4. Set coord-check to false

    Tune mup hyperparameters and LR

    The values under mup hp search were added and correspond to appendix F.4 from https://arxiv.org/pdf/2203.03466.pdf. These and LR are tuned with a random search using the scaled-up config (tested with 6-7B.yml) but with hidden-size set to the value from the scaled-down config (125M.yml).

    Transfer

    With the best LR set and the best mup HPs set, revert the value of hidden-size in the scaled-up config and run again.