This organization showcases language model pretraining with the awesome TensorFlow Model Garden library.
The following LMs are currently supported:
Following LMs were pretrained on the (10BT subset) of the famous FineWeb and FineWeb-Edu dataset:
To find the best checkpoints and compare our FineWeb-LMs to other models (BERT, ELECTRA and RoBERTa) we perform an evaluation using the great ScandEval library.
Model ID | Avg. Score | CoNLL-En | SST5 | ScaLA-En | SQuAD |
---|---|---|---|---|---|
model-garden-lms/bert-base-finewebs-951k | 69.41 | 89.25 ± 0.4 / 88.9 ± 0.37 | 58.17 ± 1.26 / 59.86 ± 1.65 | 58.83 ± 3.46 / 78.22 ± 2.11 | 55.66 ± 1.19 / 66.36 ± 1.42 |
model-garden-lms/bert-base-token-dropping-finewebs-901k | 68.01 | 88.98 ± 0.64 / 88.67 ± 0.55 | 57.79 ± 1.31 / 58.91 ± 1.85 | 54.25 ± 6.3 / 75.73 ± 3.54 | 54.4 ± 0.72 / 65.31 ± 1.01 |
model-garden-lms/teams-base-finewebs-1m | 72.64 | 89.27 ± 0.41 / 88.82 ± 0.41 | 59.58 ± 0.64 / 62.63 ± 3.0 | 66.72 ± 0.94 / 83.01 ± 0.45 | 59.95 ± 0.71 / 71.13 ± 0.58 |
google-bert/bert-base-cased | 62.26 | 87.39 ± 0.79 / 87.11 ± 0.66 | 54.49 ± 1.36 / 53.22 ± 1.15 | 52.08 ± 2.13 / 74.52 ± 1.31 | 38.63 ± 2.1 / 50.68 ± 1.87 |
google/electra-base-discriminator | 69.26 | 87.82 ± 0.69 / 86.83 ± 0.62 | 62.3 ± 1.12 / 55.93 ± 0.67 | 62.61 ± 1.21 / 80.85 ± 0.59 | 52.51 ± 0.86 / 65.2 ± 0.85 |
FacebookAI/roberta-base | 68.96 | 90.35 ± 0.23 / 90.14 ± 0.2 | 60.95 ± 1.4 / 57.52 ± 1.97 | 50.64 ± 1.69 / 74.55 ± 0.9 | 57.82 ± 1.35 / 69.68 ± 1.02 |
The TEAMS model outperforms RoBERTa and ELECTRA, which were trained on much more data and pretraining steps. All detailed results can be found in this dataset repository.
This repository is the outcome of the last two years of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.
Made from Bavarian Oberland with ❤️ and 🥨.