How to Use Lesson Transcripts to Track Your CEFR Level Over Time

Most language learners are sitting on a goldmine of progress data and don't know it. Every online lesson you've taken, if it was transcribed, contains detailed evidence of your language ability at a specific point in time. Your vocabulary choices, your sentence structures, how often you hesitated, how complex your responses were. All of it is in the text.

The challenge is knowing what to look for and how to turn raw transcript data into a picture of your progress over time. This guide walks through exactly that.

Step 1: Collect Your Transcripts

Before you can analyze anything, you need transcripts. Here's how to get them depending on where your lessons happen.

Zoom lessons: Zoom can auto-transcribe recorded meetings if you have a paid account with cloud recording enabled. Go to your recording settings and enable audio transcription. After your lesson, the transcript appears alongside the recording in your Zoom account.

Preply and iTalki: Both platforms record lessons by default when both parties consent. Preply provides a transcript through their lesson notes feature. For iTalki, you may need to use a third-party transcription tool on your recording.

Google Meet: Google Meet provides auto-generated transcripts on Workspace accounts. The quality varies but is usually sufficient for linguistic analysis.

Manual transcription: If you have audio but no transcript, tools like Otter.ai or Rev.com can transcribe a recording quickly. For analysis purposes, accuracy above roughly 85% is sufficient.

Once you have a transcript, save it as a .txt or .docx file. Remove anything other than the lesson conversation, including platform-generated metadata at the top of the file.

Step 2: Separate Your Speech Turns From the Tutor's

For progress analysis, you only want to analyze what you said, not your tutor. In most transcripts, turns are labeled by speaker name or role. Make sure your turns are clearly distinguishable.

A typical transcript looks something like this:

Tutor: So, tell me about your experience with the project. Learner: Yes, the project was very challenging for me. We had to coordinate with three different teams, which I haven't done before, and sometimes the communication was not clear.

Your analysis focuses only on the learner turns. If your transcript doesn't clearly label speakers, go through and manually tag which turns are yours before moving to the next step.

For accurate CEFR analysis, you want at least 200 words of your own speech per transcript. A typical 45-minute conversation lesson should produce well over that.

Step 3: Identify the Six Linguistic Signal Categories

CEFR proficiency maps to observable features in language output. Here are the six main signal categories and what to look for in your transcripts.

Vocabulary range: How diverse is your word choice? Are you using everyday, high-frequency words or lower-frequency, domain-specific vocabulary? Repetition of the same basic words (good, nice, very, big) suggests a narrower range. Use of words like "coordinate," "terminology," or "elaborate" suggests a wider range.

Grammar complexity: Count the variety of tense forms you use. A1-A2 speakers typically rely on simple present and simple past. B1 adds present perfect and future forms. B2 and above uses past perfect, conditionals, and passive constructions naturally. Also look at your sentence structure: are most of your sentences simple subject-verb-object, or do you use subordinate clauses and relative clauses regularly?

Fluency markers: Look for filler words and self-corrections. Words like "um," "uh," "like," "you know," and false starts ("I think I, well, what I mean is...") indicate fluency strain. A lower filler ratio and longer average response length generally correspond to higher fluency. Compare your average response length in early transcripts to recent ones.

Error density: Count grammatical errors per 100 words of your speech. Be honest here. Common error types include subject-verb agreement, article usage (a/an/the), preposition choice, and tense consistency. A rough count of errors and their types tells you which areas need focus.

Response sophistication: Do you answer questions with short confirmations or do you elaborate, give examples, add nuance, and express opinions with supporting reasoning? Discourse markers like "however," "on the other hand," "what I mean by that is," and "to give an example" are B2+ indicators.

Progression indicators: Compare a recent transcript to an older one. Did your average sentence length increase? Are you using tense forms you weren't using before? Did your filler ratio decrease?

Step 4: Map Your Signals to CEFR Levels

Each signal maps to a CEFR range. Here are rough reference points.

| Signal | B1 | B2 | C1 | |--------|----|----|----| | Tense variety | 3-4 tense forms | 5-7 tense forms | 7+ with confident use | | Subordinate clause ratio | 0.10-0.20 | 0.25-0.40 | 0.40+ | | Avg response length | 10-15 words | 20-35 words | 40+ words | | Filler ratio | High (10%+) | Moderate (5-10%) | Low (<5%) | | Error rate per 100 words | 5-10 | 2-5 | Under 2 |

Your overall CEFR level is a composite of all these signals. One signal alone doesn't determine your level. A learner with C1 vocabulary but B1 grammar complexity would be assessed as B1-B2 overall, with a note that their vocabulary is outpacing their grammar.

This uneven profile is common and meaningful. It tells you exactly where to direct your study effort.

Step 5: Track Changes Across Multiple Transcripts

The real value of transcript analysis comes from comparison. One transcript gives you a snapshot. Five transcripts over five months give you a trajectory.

For each transcript, record:

The date of the lesson
Your overall estimated CEFR level
Sub-skill assessments for vocabulary, grammar, fluency, accuracy, and discourse
Two or three specific quotes that represent your current level

Over time, you'll see which skills are improving and which are stagnant. This is the information your tutor needs to focus your lessons productively.

Why This Is Hard to Do Manually (And What Helps)

The analysis above is tractable for one transcript. For ongoing tracking across many transcripts, it becomes genuinely time-consuming. Counting subordinate clauses, calculating filler ratios, categorizing error types, all of it adds up. And doing it consistently enough for the results to be comparable across time requires a level of methodological rigor that's hard to maintain on your own.

This is the problem Fluency Lens was built to solve. You upload a transcript, and it does all of this automatically. It extracts all six signal categories, assigns CEFR sub-levels with plain-language explanations backed by direct quotes from your transcript, and tracks your trajectory on a visual dashboard with separate trend lines for vocabulary, grammar, fluency, and accuracy over time.

The CEFR classifications don't just give you a number. If your grammar is rated B2, the explanation tells you why: which tense forms you used, what your subordinate clause ratio was, and how that compares to the B2 threshold. That specificity is what makes the data actionable rather than just informational.

For the manual approach, start with three transcripts from different points in your learning and compare the signals directly. You'll likely see more change than you expected, in directions that surprise you.

For a faster, more systematic approach, the free tier at Fluency Lens lets you upload three transcripts and see the full signal extraction and CEFR breakdown at no cost. That's enough to see what your data actually looks like.

FAQ

How many transcripts do I need before tracking progress makes sense? Two transcripts from different dates give you a minimum comparison. Three or more start to show a reliable trend. The more transcripts you have, and the more spread out they are over time, the clearer your trajectory becomes.

What if my transcript doesn't label speaker turns? Go through the transcript and manually label which turns are yours before analysis. Without separating your speech from the tutor's, the analysis will include their language, which is typically much more advanced and will skew every signal.

What does an uneven CEFR profile mean? It means some sub-skills are stronger than others. This is normal. Learners who acquire language through immersion often have strong vocabulary and discourse sophistication but weaker grammar accuracy. Learners who studied formally often have the reverse. An uneven profile tells you which sub-skill to prioritize.

Can I use this approach for languages other than English? The CEFR framework applies to all languages. The linguistic signals described here are also language-general, though the specific thresholds (subordinate clause ratios, tense variety) vary by language. Fluency Lens currently supports transcript analysis with the target language specified during upload.