CARVIEW |
Select Language
HTTP/2 200
date: Thu, 09 Oct 2025 14:01:23 GMT
content-type: text/html; charset=utf-8
content-encoding: gzip
last-modified: Mon, 05 Dec 2011 14:43:13 GMT
cache-control: max-age=21600
expires: Thu, 09 Oct 2025 20:01:23 GMT
vary: Accept-Encoding
x-backend: www-mirrors
x-request-id: 98be68e1cf2adfa6
strict-transport-security: max-age=15552000; includeSubdomains; preload
content-security-policy: frame-ancestors 'self' https://cms.w3.org/ https://cms-dev.w3.org/; upgrade-insecure-requests
cf-cache-status: BYPASS
set-cookie: __cf_bm=jBHUT6xSHdWGZXgLLLs_HLmgNVb.n3YzN5FjLzpG_Q0-1760018483-1.0.1.1-_44XRJ5cCzvS1khTmkBCnlFHJDewDk2vY51sSjWRJKI_sY3LYuBOpWumzHZJvL6DjTG7lmpXVefYyStmcCYTcYX.z7zQwVYc2n4bqP_VOG4; path=/; expires=Thu, 09-Oct-25 14:31:23 GMT; domain=.w3.org; HttpOnly; Secure; SameSite=None
server: cloudflare
cf-ray: 98be68e1cf2adfa6-BLR
alt-svc: h3=":443"; ma=86400
Lexical Quality as a Measure for Textual Web Accessibility












Lexical Quality as a Measure for Textual Web Accessibility
- Ricardo Baeza-Yates, Yahoo! Research and Web Research Group, Universitat Pompeu Fabra
- Luz Rello, Natural Language Processing Research Group and Web Research Group, Universitat Pompeu Fabra
WAI RDWG Symposium on Website Accessibility Metrics, December 5, 2011
Problem Addressed
- Measurement of the lexical quality of the Web.
- Lexical quality the representational aspect of the textual Web content, the quality degree of words in a text (spelling errors, typos, etc.).
- Lexical quality is not presented as an accessibility metric but it is useful, since the quality of words and language impacts the readers understanding.
- WCAG principle of content being "perceivable" and "understandable".
WAI RDWG Symposium on Website Accessibility Metrics, December 5, 2011
Strategy
The output of the p(d) function changes depending on the number of elements to measure and d, which is the degree of disjunction.
- Defining a lexical quality metric based on the different kind of errors that appear in the Web:
- Kind of errors. Sample W (1,345 errors):
- regular spelling: *toomorrow.
- typographical: *tomorroe.
- non-native speakers: *tomorow.
- dyslexic: *torromow.
- optical character recognition (OCR) errors: *tornorrow.
- Lexical quality metric:
- We measured a lower bound of the fraction (f) of Web pages with lexical errors and the relative fraction (d) of each kind of error in the sample W.
- The corresponding fraction of Web pages with lexical errors is then (f x d).
- To compute LQ, we estimate (df) by searching each word in the English pages of a major search engine.
- We use data from a leading search engine to estimate this value.
- LQ = meanwi ∈ WM (df misspell wi/df correct wi)
WAI RDWG Symposium on Website Accessibility Metrics, December 5, 2011
Outcomes
- Although the lexical quality measured will vary with the set of words chosen, the relative order of the measure will hardly change as the size of the set grows.
- LQ provides independent information about the quality of a website.
- Pearson correlation with the following measures for the top 20 sites in English of Alexa.com: Alexa unique visitors, number of pages, number of in-links, and ComScore Unique Visitors.
Table 1: Pearson correlation for several measures in the top 20 English sites of Alexa.com in March 2011. Measure Alexa Pages Links ComScore LQ 0.4451 0.4167 0.3966 0.2356 Alexa 0.7659 0.6897 0.6589 Size 0.8655 0.3097 Links 0.1319 - We applyed our methodology to several large Web domains, the major English speaking countries and the major and social media websites. Expected domains have better lexical quality (USA and UK universities and goverment) and media websites have worse lexical quality than the Web average.
WAI RDWG Symposium on Website Accessibility Metrics, December 5, 2011
Lexical Quality as a Measure for Textual Web Accessibility
- Ricardo Baeza-Yates, Yahoo! Research and Web Research Group, Universitat Pompeu Fabra
- Luz Rello, Natural Language Processing Research Group and Web Research Group, Universitat Pompeu Fabra
WAI RDWG Symposium on Website Accessibility Metrics, December 5, 2011