How to Remove Hebrew Vowels from Text with JavaScript or Python

Updated on

Here’s an easy way to remove Hebrew vowels (niqqud or nekudot) from unicode text using Python, JavaScript, or any programming language. This tutorial will use an excerpt from the Hebrew Bible as an example.

The first line of the Bible in Hebrew

What are Nekudot?

Hebrew is written without vowels, so there is no indication of exactly how to pronounce a word just by looking at it. Most of the letters are consonants.

For example, the Hebrew word meshugah would be written in Hebrew with just the letters M-SH-W-G-’ (משוגע). The E and A sounds in that word aren’t written.

Another example is the word lekhem, which is typically written only as L-KH-M (לחם).

In order to make the writing easier to read, the letters are sometimes written with special markings called niqqud (nekudot is plural). The marks are dots and lines that look like this when added to the word meshugah:

Two places where those vowel markings are commonly found are in the Bible and in books written for kids or people who are learning Hebrew.

Here’s what the first line of the book of Genesis (Bereshit) looks like with vowel markings:

The first line of the Bible in Hebrew

How to Remove the Nekudot from Unicode Text

The vowel markings are separate codes that can be stripped out of the text. We want to remove the characters in this Hebrew unicode block that are marked with red in the image below.

Hebrew unicode block with the niqudot

The range is 0591 to 05C7. (You can adjust which ones are removed by looking up the codes in the chart above.) They can be stripped out with a regular expression.

Remove Hebrew Vowels with JavaScript

This JS function will remove the vowel marks from Hebrew text that is passed into it:

function removeVowels(text) {
  return text.replace(/[\u0591-\u05C7]/g, "");
}

The first words of the Bible are:

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ

(“In the beginning, God created the heavens and the earth.”)

Here’s the code to create a version without the vowel marks — you can run it right in the browser console:

function removeVowels(text) {
  return text.replace(/[\u0591-\u05C7]/g, "");
}

const originalText =
  "בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ";

const vowelsRemoved = removeVowels(originalText);

// print out the results
console.log(vowelsRemoved);

It will print out “בראשית ברא אלהים את השמים ואת הארץ”.

Remove Hebrew Vowels with Python

Here’s some Python code that will do the same thing.

import re

vowel_pattern = re.compile(r"[\u0591-\u05C7]")


def remove_vowels(text: str) -> str:
    return re.sub(vowel_pattern, "", text)

Other Programming Languages

The same technique can be used for other programming languages too. Just remove that range of characters from the string, and the vowels will be gone.

Tagged with: Programming JavaScriptPython

Feedback and Comments

What did you think about this page? Do you have any questions, or is there anything that could be improved? You can leave a comment after clicking on an icon below.