本帖最后由 狄霽顥嵚 于 2015-1-27 19:25 編輯
1、前言:因?yàn)樵诠芾韰^(qū)看到投票驗(yàn)證碼的帖子,想起來之前在技術(shù)討論區(qū)看到的一篇關(guān)于這個(gè)的文章,但是原帖找不到了,所以樓主自己搜集編輯了一下,大家看看漲漲姿勢吧~
常在網(wǎng)上飄,驗(yàn)證碼相信大家?guī)缀趺刻於紩?huì)見到,可是你知道么?當(dāng)你每次不耐煩地輸入一個(gè)很難看清楚的單詞的時(shí)候,除了防止垃圾注冊(cè)或者評(píng)論以外,其實(shí)都在悄悄為人類文明做出一點(diǎn)貢獻(xiàn)呢。
驗(yàn)證碼(CAPTCHA),還有一個(gè)冗長的名字叫做“全自動(dòng)區(qū)分計(jì)算機(jī)和人類的圖靈測試”(Completely Automated Public Turing test to tell Computers and Humans Apart)。在國外很多下載網(wǎng)站上,Google reCAPTCHA驗(yàn)證碼使用得非常多,它可以免費(fèi)申請(qǐng),而悄悄做貢獻(xiàn)的就是它。@JimmyLiye編譯了Google的一篇文章,講述了這個(gè)乍一看很不可思議的小故事。
這事兒還得從OCR說起。OCR的意思是光學(xué)字符識(shí)別,簡單地說就是將圖片型的文字掃描并識(shí)別出來。問題是,這種技術(shù)現(xiàn)在的效率實(shí)在不敢恭維,經(jīng)常錯(cuò)誤百出。你看:
有一天,某臺(tái)機(jī)器掃描了一本書,想把它轉(zhuǎn)成電子版:
處理出來是這樣子的,勉強(qiáng)能看看:
The Hreckinridge' and Lane Democrats, having taken courage at the recent eastern advises, are [xxxxxxxxxx] energetically for the campaign: Several prominent Democrats who at first favoredDonoLea, are coming out. for the other aide, apparently under the [xxxxxxxx] of Federal [xxxxxxxxx]. An address to the National Democracy of ,1ifornia, urging the party to supportHaeeslipslDas, has recently been published, which manifestlybss strengthened that aide of the [xxxxxxxxx]: It is signed by 65 Democrats, many of whom occupy respectab e and prominent positions in the party, 22 of them are Federal office-holders,[xxxxx] more are recipients of Federal patronage, and the others represent a mass of politicians giving the document [xxxx][xxxxxx] mTheDcu8las Democrats are also active The Irish and German vote will mostly go with ths# branch of the party, but it is[xxxxxxxxx] to [xxxxxxxx] [xxxxx] [xxxx] [xx] the stronger. Thus far 17 IT newspapers have declared for DonGres, 13 for Base$- IaaIDGS and 9 remain non-committal, with even chances of going either way. Under these circumstances the Republicans entertain not unjustifiable hopes that the Democratic divisions may be so equal,- ly balanced as to give the State [xx] LIaCOLV.Same very [xxxxxxx] Bell and Everett meetings have been held in different parts of the State, bat thus far that party does not exhibit much rank sad ale air en.
還有這個(gè),原書質(zhì)量比較差:
看到這個(gè),電腦就傻眼了,吐出來一堆亂七八糟不知所云的東西:
‘ letz-1- rrk fit: 1′ . on its to Vc ,rt, cann into tlm yc H_ tcr,la, .n. ‘l l; , arc ti:( h of thc 1″,ats that to ltc rc: ,;. , I; ., l: rel!;n. tani., , ./olio, IJuteilu, . 1!’i./_ ;lr”n. Iiam! Jr.r. F’l,nr_.Z.._%i;;, ,, : rt-Irn: am/ tf.rri.:, t?m steamer as a tr nW r. Uu ,tin;t, c ac?1 1″,at firm/ a t;nn, accor.liu; to .t rn. ‘Cl.w r. wu ru lm:nui MistinW /y in u;th, -. ink ;:,k as to “what w ax 1111, :111(I vle:iR a of ;: (,am( into, mnr r-, tm if tlm wo r( uu.i n:’ of t?u : la?:Iv. \ ‘c : ol in thc , ucr:atic , , Tlau :; will h:aw tu-li.r \. ’1′Im yap?tts Il ,,n an,/ I, ,rr:l. r, (,t tf,is r:ity, start witli it, with lu:rtic: ol \ 1- e:l.k.
看得懂嗎?反正我是看不懂。
reCAPTCHA驗(yàn)證碼的目的之一就是為了改變這種情況的。下邊這張圖可以很好的解釋它的工作原理:1、掃描書籍;2、提取OCR無法識(shí)別的單詞;3、進(jìn)一步扭曲并加入隨機(jī)橫線來增強(qiáng)安全性;4、使用兩個(gè)單詞生成驗(yàn)證碼來讓用戶識(shí)別。
有了它的幫助,第二張圖片上面的文字就基本順暢了(盡管還是有一點(diǎn)小錯(cuò)誤):
The New-York State yacht Squadron, on its annual cruise to Newport came into the harbor yesterday afternoon. The following are the names of the boats that came to anchor here: Jessie, gera loliv erelun Annie, Mannering, Julia, Bonita, Magic wut, Rambler, floumblie, Henrietta, Sea-Drift and Maria, with the steamer America as a tender. On anchoring each boat fired a gun, according to custom. The reports were heard distinctly in the city, causing considerable inquiry as to "what was up", and quite a number of sanguine individuals came into our office to inquire if the guns were not annunciatory signals of the successful laying of the Atlantic Cable. We invariably replied in the negative. The squadron will leave to-day for Newport. The yachts Washington and buub r of this city, start with it, with parties of New Haven people.
有的人可能要問了,既然機(jī)器都看不明白,那他怎么判斷你輸對(duì)了還是錯(cuò)了呢?這個(gè)問題問得好,Google的解決方法也很絕:
兩個(gè)驗(yàn)證碼里面有一個(gè)是正確的,被人審核過的,而另一個(gè)是不正確的,機(jī)器讀不出來的。當(dāng)你把那個(gè)正確的輸對(duì)以后,我們就會(huì)默認(rèn)另外一個(gè)也是對(duì)的。這樣,你每輸入一次驗(yàn)證碼,就為人類的知識(shí)寶庫增加了一個(gè)單詞。
(不是一個(gè)用戶識(shí)別一個(gè)單詞,是很多個(gè)用戶識(shí)別,就算每個(gè)人認(rèn)為的都不同,但數(shù)量越多最后的結(jié)果就越接近正確答案。。?!獦侵髯ⅲ?br />
2.第二段的大意就是驗(yàn)證碼可以不用取消,改改方式。(搜資料時(shí)偶然看到想起的)
百度輸驗(yàn)證碼的時(shí)候就是這種方式,用漢字,下面是個(gè)九宮格,直接點(diǎn)擊匹配的漢字就ok*先裝個(gè)sp1,一會(huì)編輯* |