﻿<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/">
<channel>
<title><![CDATA[花瓣笺 - 2008 - 4]]></title>
<link>http://blog.lvdaocn.com/index.php/feed/2008/04/</link>
<language>zh-cn</language>
<description><![CDATA[给面子的兄弟都叫一声花哥了....]]></description>
<pubDate>Mon, 06 Feb 2012 19:47:54 -0800</pubDate>
<item>
<title><![CDATA[javascript 实现简易中文分词算法]]></title>
<link>http://blog.lvdaocn.com/index.php/archives/23/</link>
<pubDate>Thu, 24 Apr 2008 07:43:41 +0000</pubDate>
<category><![CDATA[工程师花哥]]></category>
<description><![CDATA[这是专业课的一个实验作业，要求如下：

 &lt;!--  /* Font Definitions */  @font-face 	{font-family:宋体; 	panose-1:2 1 6 0...]]></description>
<guid>http://blog.lvdaocn.com/index.php/archives/23/</guid>
<slash:comments>1</slash:comments>
<comments>http://blog.lvdaocn.com/index.php/archives/23/#comments</comments>
<content:encoded><![CDATA[<p>这是专业课的一个实验作业，要求如下：</p><p><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:PunctuationKerning /> <w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing> <w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery> <w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery> <w:ValidateAgainstSchemas /> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:Compatibility> <w:SpaceForUL /> <w:BalanceSingleByteDoubleByteWidth /> <w:DoNotLeaveBackslashAlone /> <w:ULTrailSpace /> <w:DoNotExpandShiftReturn /> <w:AdjustLineHeightInTable /> <w:BreakWrappedTables /> <w:SnapToGridInCell /> <w:WrapTextWithPunct /> <w:UseAsianBreakRules /> <w:DontGrowAutofit /> <w:UseFELayout /> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" LatentStyleCount="156"> </w:LatentStyles> </xml><![endif]--> &lt;!--  /* Font Definitions */  @font-face 	{font-family:宋体; 	panose-1:2 1 6 0 3 1 1 1 1 1; 	mso-font-alt:SimSun; 	mso-font-charset:134; 	mso-generic-font-family:auto; 	mso-font-pitch:variable; 	mso-font-signature:3 135135232 16 0 262145 0;} @font-face 	{font-family:"\@宋体"; 	panose-1:2 1 6 0 3 1 1 1 1 1; 	mso-font-charset:134; 	mso-generic-font-family:auto; 	mso-font-pitch:variable; 	mso-font-signature:3 135135232 16 0 262145 0;}  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0cm; 	margin-bottom:.0001pt; 	text-align:justify; 	text-justify:inter-ideograph; 	mso-pagination:none; 	font-size:10.5pt; 	mso-bidi-font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:宋体; 	mso-font-kerning:1.0pt;}  /* Page Definitions */  @page 	{mso-page-border-surround-header:no; 	mso-page-border-surround-footer:no;} @page Section1 	{size:612.0pt 792.0pt; 	margin:72.0pt 90.0pt 72.0pt 90.0pt; 	mso-header-margin:36.0pt; 	mso-footer-margin:36.0pt; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --&gt; <!--[if gte mso 10]><br /><mce:style><!<br />/* Style Definitions */<br />table.MsoNormalTable<br />{mso-style-name:普通表格;<br />mso-tstyle-rowband-size:0;<br />mso-tstyle-colband-size:0;<br />mso-style-noshow:yes;<br />mso-style-parent:"";<br />mso-padding-alt:0cm 5.4pt 0cm 5.4pt;<br />mso-para-margin:0cm;<br />mso-para-margin-bottom:.0001pt;<br />mso-pagination:widow-orphan;<br />font-size:10.0pt;<br />font-family:"Times New Roman";<br />mso-fareast-font-family:"Times New Roman";<br />mso-ansi-language:#0400;<br />mso-fareast-language:#0400;<br />mso-bidi-language:#0400;}<br />--><br /><!--[endif]--></p><p class="MsoNormal"><span lang="EN-US">1</span><span style="font-family: 宋体;">、合并使用停用词表和关键词表作为关键词表，应用逆向最长匹配法对所有篇名分词，给出每条篇名对应的分词结果。在屏幕上显示篇名序号、篇名、分词结果。</span></p><p class="MsoNormal"><span lang="EN-US">2</span><span style="font-family: 宋体;">、去除停用词（显示在屏幕上）。</span></p><p class="MsoNormal"><span lang="EN-US">3</span><span style="font-family: 宋体;">、利用</span><span lang="EN-US">tfx</span><span style="font-family: 宋体;">词频加权公式，计算各词的词频，在屏幕上显示每条篇名中各词的权重。</span></p><p class="MsoNormal"><span lang="EN-US">4</span><span style="font-family: 宋体;">、根据输入的阈值，确定标引词，并在屏幕上显示标引词。</span></p><p class="MsoNormal"><span lang="EN-US">5</span><span style="font-family: 宋体;">、根据输入的标引深度，确定标引词，并在屏幕上显示标引词。</span></p><p class="MsoNormal">&nbsp;</p><p class="MsoNormal"><span style="font-family: 宋体;">由于开发环境不限，所以我挑了个最简单的js-_-</span></p><p class="MsoNormal"><span style="font-family: 宋体;">但是要特别声明：</span></p><p class="MsoNormal"><span style="font-family: 宋体;">本demo仅仅实现算法，完全不可以作为应用来使用。</span></p><p class="MsoNormal"><span style="font-family: 宋体;">测试地址:&nbsp; <a title="javascript 中文分词简易算法" href="http://re.lvdaocn.com/static/demo/fenci.htm" target="_blank">js中文分词简易算法</a></span></p><p class="MsoNormal"><span style="font-family: 宋体;">下载地址：右键-&gt;网页另存为&nbsp; -_-</span></p><p class="MsoNormal">&nbsp;</p><p class="MsoNormal">&nbsp;</p><p class="MsoNormal"><span style="font-family: 宋体;"><span style="color: #ff00ff;">Ps:发现好多学弟学妹通过我们最擅长的搜索工具找到这里。提醒下，大学四年，你可是该仔细作个作业了:)</span><br /></span></p><p>&nbsp;</p>]]></content:encoded>
<author><![CDATA[花哥]]></author>
<dc:creator><![CDATA[花哥]]></dc:creator>
<wfw:commentRss>http://blog.lvdaocn.com/index.php/feed/archives/23/</wfw:commentRss>
</item>
</channel>
</rss>
