Generalization and Zeros-白红宇

强烈建议你试试无所不能的chatGPT，快点击我

Generalization and Zeros

阅读量：6585 次

发布时间：2019-06-24

本文共 768 字，大约阅读时间需要 2 分钟。

Two question

Overfitting

From Unigram, Bigram, Trigram, Quadrigram, the prediction of Quadrigram is better than Trigram, than Bigram, than Unigram.

But N-grams only work well for word prediction if the test corpus looks like the training corpus. In real life, this doesn't happen, so we should train robust models that do a better job of generalizing.

Zeros

Firstly, if there is V words and we use Bigram, it will generalizate V^2, there are a lot of probality is zero. What's the worse, the Quadrigrams will generalizate more zero.

Secondly, things that never occurred in the training set, but do occur in the test data, and we can never compute perplexity. This is a big problem we need to solve.

转载于:https://www.cnblogs.com/chuanlong/archive/2013/04/25/3042508.html

你可能感兴趣的文章

PHP中如何对二维数组按某个键值进行排序

SharePoint 2013 EventHanlder工具

jQuery和javascript的区别

doctest --- 一个改善python代码质量的工具

hdu2141Can you find it?

值类型和引用类型（转）

Axure RP 8 下载激活可以使用的授权码、用户名、秘钥等

20155303 2016-2017-2 《Java程序设计》第四周学习总结

c语言基础课第三次作业

MogileFS系统简单配置实例

【转】[C# 基础知识系列]专题九：深入理解泛型可变性

AS3.0 学习笔记002

map, hash_map, multimap的使用及区别

NLog配置文件根节点

Java中的SPI Service Provider Interface 介绍及示例

nginx 不记录指定类型日志

为某个老狗提供表白基址

csa Round #66 (Div. 2 only)

虚拟机全屏问题

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！-- 愿君每日到此一游！

当前时间: 2025-01-23 10:40:09 当前IP: 3.129.247.57 联系邮箱:javaeecc@qq.com Copyright © 2020 - 2022 baihongyu.com 京ICP备2021015314号-2

强烈建议你试试无所不能的CHAT-GPT，快点击我

强烈建议你试试无所不能的CHAT-GPT，快点击我