Ngram Search





Vocabulary Analysis

Term Definition
IW Intermediate Writing
HW High-intermediate Writing
% Percentage
articles Total number of articles
types Total number of types
tokens Total number of tokens

Table 1. Word tokens and types
Intermediate High-Intermediate Intermediate+High-Intermediate
tokens 1180886 916883 2097769
types 18261 18222 27787

Table 1 shows the total number of tokens and types of all written samples from the Intermediate Level and High-intermediate Level, and the sum of both levels.

Table 2. Word Use in Intermediate Level: Analyses of Type and Token
TOPIC ID IW-0801 IW-0802 IW-0803 IW-0861 IW-0862 IW-0863 IW-0901 IW-0962 IW-1001 I_document
articles 630 648 713 608 695 697 1414 1452 1520 8377
% 7.52% 7.74% 8.51% 7.26% 8.30% 8.32% 16.88% 17.33% 18.14% 100.00%
tokens 82986 91086 86924 93464 87904 94418 201534 207567 235003 1180886
avg.each 131.724 140.565 122.084 153.724 126.481 135.463 142.528 142.952 154.607 140.968
% 7.03% 7.71% 7.36% 7.91% 7.44% 8.00% 17.07% 17.58% 19.90% 100.00%
types 3342 5108 4219 4211 4756 4183 5639 5379 6118 18261
avg.each 82.887 85.071 73.281 91.882 77.114 78.824 85.056 85.499 88.646 83.966
% 18.30% 27.97% 23.10% 23.06% 26.04% 22.91% 30.88% 29.46% 33.50% 100.00%

Table 2 provides the total number of articles, the total number, average number and percentage of word tokens and word types in each topic of intermediate composition. The total number of articles and word tokens and types is listed in the column on the far right. IW-0801, IW-0802, IW-0803, IW-0861, IW-0862, IW-0863, IW-0901, IW-0962, and IW-1001 indicate different topics.

Table 3. Word Use in High-Intermediate Level: Analyses of Type and Token
TOPIC ID HW-0801 HW-0861 HW-0901 HW-0961 HW-1001 HI_document
articles 495 502 1159 1178 1047 4381
% 11.30% 11.46% 26.46% 26.89% 23.90% 100.00%
tokens 99770 110561 238290 239369 228893 916883
avg.each 201.556 220.241 205.6 203.199 218.618 209.286
% 10.88% 12.06% 25.99% 26.11% 24.96% 100.00%
types 4843 5031 8839 7839 7536 18222
% 26.58% 27.61% 48.51% 43.02% 41.36% 100.00%
avg.each 112.081 125.833 122.009 118.517 124.189 120.907

Table 3 provides the total number of articles, the total number, average number and percentage of word tokens and word types in each topic of high-intermediate composition. The total number of articles and word tokens and types is listed in the column on the far right. HW-0801, HW-0861, HW-0901, HW-0961, and HW-1001 represent different topics.

Table 4. The percentage of words in each word level used in high-intermediate and intermediate composition. According to the word list provided by Jeng, Hengsyung (2002), every 1,000 words count as one level. Words that are not in the word list count as level 7*.
TOPIC ID(#articles) Lv1 Lv2 Lv3 Lv4 Lv5 Lv6 Lv7 Total
IW-0801(630) 77.64% 8.83% 3.71% 3.65% 0.24% 0.20% 5.75% 100.00%
IW-0802(648) 74.71% 8.26% 3.85% 3.55% 0.44% 0.24% 8.97% 100.00%
IW-0803(712) 74.25% 10.13% 2.97% 1.90% 0.29% 0.47% 9.98% 100.00%
IW-0861(608) 79.75% 6.84% 2.98% 2.02% 0.31% 0.21% 7.89% 100.00%
IW-0862(695) 74.34% 10.33% 3.96% 3.20% 0.31% 0.23% 7.63% 100.00%
IW-0863(697) 74.22% 10.07% 5.34% 2.20% 0.39% 0.25% 7.54% 100.00%
IW-0901(1414) 65.57% 5.66% 5.85% 2.15% 0.22% 0.17% 20.38% 100.00%
IW-0962(1452) 67.64% 3.91% 7.34% 7.91% 0.93% 0.14% 12.12% 100.00%
IW-1001(1520) 68.94% 7.97% 4.74% 2.24% 0.69% 0.12% 15.30% 100.00%
I_document(3990) 68.41% 5.59% 5.58% 3.71% 0.46% 0.15% 16.10% 100.00%
HW-0801(495) 65.55% 13.50% 5.94% 5.86% 0.76% 0.82% 7.58% 100.00%
HW-0861(502) 65.37% 13.54% 6.74% 4.90% 1.48% 0.48% 7.49% 100.00%
HW-0901(1159) 66.73% 12.05% 6.17% 5.01% 0.73% 1.12% 8.20% 100.00%
HW-0961(1178) 64.30% 12.75% 6.24% 5.68% 1.37% 1.53% 8.14% 100.00%
HW-1001(1047) 67.15% 13.11% 5.42% 5.64% 0.75% 0.58% 7.36% 100.00%
HI_document(4381) 65.90% 12.82% 6.05% 5.42% 1.00% 0.99% 7.83% 100.00%
*Level 7 - words that are not listed in Jeng (2002).