Word Embedding Download

Word Embedding

Sources

发布日期: 2022-08-16

更新日期: 2022-08-16

文章字数: 256

阅读时长: 1 分

阅读次数:

Word2Vector

Chinese Word Vectors 中文词向量

英文词向量

Fasttext

Find more fasttext pretrained model at: fastText.

语言	文件名
zh	cc.zh.300.vec.zip
en	wiki-news-300d-1M.vec.zip
	wiki-news-300d-1M-subword.vec.zip
	crawl-300d-2M.vec.zip
	crawl-300d-2M-subword.zip

The first line of the file contains the number of words in the vocabulary and the size of the vectors. Each line contains a word followed by its vectors, like in the default fastText text format. Each value is space separated. Words are ordered by descending frequency. These text models can easily be loaded in Python using the following code:

import io

def load_vectors(fname):
    fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
    n, d = map(int, fin.readline().split())
    data = {}
    for line in fin:
        tokens = line.rstrip().split(' ')
        data[tokens[0]] = map(float, tokens[1:])
    return data

Bert

	H=128	H=256	H=512	H=768
L=2	2/128 (Tiny)	2/256	2/512	2/768
L=4	4/128	4/256 (Mini)	4/512 (Small)	4/768
L=6	6/128	6/256	6/512	6/768
L=8	8/128	8/256	8/512 (Medium)	8/768
L=10	10/128	10/256	10/512	10/768
L=12	12/128	12/256	12/512	12/768 (Base)