Word2Vector
Fasttext
Find more fasttext pretrained model at: fastText.
语言 | 文件名 |
---|---|
zh | cc.zh.300.vec.zip |
en | wiki-news-300d-1M.vec.zip |
wiki-news-300d-1M-subword.vec.zip | |
crawl-300d-2M.vec.zip | |
crawl-300d-2M-subword.zip |
The first line of the file contains the number of words in the vocabulary and the size of the vectors. Each line contains a word followed by its vectors, like in the default fastText text format. Each value is space separated. Words are ordered by descending frequency. These text models can easily be loaded in Python using the following code:
import io
def load_vectors(fname):
fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
n, d = map(int, fin.readline().split())
data = {}
for line in fin:
tokens = line.rstrip().split(' ')
data[tokens[0]] = map(float, tokens[1:])
return data
Bert
H=128 | H=256 | H=512 | H=768 | |
---|---|---|---|---|
L=2 | 2/128 (Tiny) | 2/256 | 2/512 | 2/768 |
L=4 | 4/128 | 4/256 (Mini) | 4/512 (Small) | 4/768 |
L=6 | 6/128 | 6/256 | 6/512 | 6/768 |
L=8 | 8/128 | 8/256 | 8/512 (Medium) | 8/768 |
L=10 | 10/128 | 10/256 | 10/512 | 10/768 |
L=12 | 12/128 | 12/256 | 12/512 | 12/768 (Base) |