Interview Question

Data Engineer Interview

-

Meta

want you to write me a simple spell checking engine. The query language is a very simple regular expression-like language, with one special character: . (the dot character), which means EXACTLY ONE character (it can be any character). So, for example, 'c.t' would match 'cat' as the dot matches any character. There may be any number of dot characters in the query (or none). Your spell checker will have to be optimized for speed, so you will have to write it in the required way. There would be a one-time setUp() function that does any pre-processing you require, and then there will be an isMatch() function that should run as fast as possible, utilizing that pre-processing. There are some examples below, feel free to ask for clarification. Word List: [cat, bat, rat, drat, dart, drab] Queries: cat -> true c.t -> true .at -> true ..t -> true d..t -> true dr.. -> true ... -> true .... -> true ..... -> false h.t -> false c. -> false */ // write a function // Struct setup(List<String> list_of_words) // Do whatever processing you want here // with reasonable efficiency. // Return whatever data structures you want. // This function will only run once // write a function // bool isMatch(Struct struct, String query) // Returns whether the query is a match in the // dictionary (True/False) // Should be optimized for speed

AnswerAdd Tags

Interview Answers

25 Answers

3

Here is the Python Code (inspired by someone's code on this page): def setUp(word, input_list): word = word.strip() temp_list = [] Ismatch = False if word in input_list: Ismatch = True elif word is None or len(word) == 0: Ismatch = False else: for w in input_list: if len(w) == len(word): temp_list.append(w) for j in range(len(temp_list)): count=0 for i in range(len(word)): if word[i] == temp_list[j][i] or word[i] == '.': count += 1 else: break if count == len(word): Ismatch = True print(Ismatch) def isMatch(word, input_list): return setUp(word, input_list) isMatch('c.t', ['cat', 'bte', 'art', 'drat', 'dart', 'drab'])

_bat on

4

bear in mind for your solution, checking the lengths of words in the dictionary is very fast. That's what you can use your setup for. There's no need to iterate through the whole loop of checks if the word fails the length already. See my solution above

iwin on

1

This was the fastest I could do without regex: def func(wrd,lst): if len(wrd) not in [len(x) for x in lst]: return False elif wrd in lst: return True else: lst1 = [x for x in lst if len(x)==len(wrd)] for z in lst1: c=0 for i in range(len(wrd)): if wrd[i] != '.' and wrd[i] == z[i]: c=c+1 if len(wrd)-wrd.count('.') == c: return True return False

Dnyanraj Nimbalkar aka Samrat on

1

""" words = ["cat","dog","four","five"] { 3: { 0: {'c': 1, 'd': 1}, 1: {'a': 1, 'o': 1}, 2: {'t': 1, 'g': 1} }, 4: { 0: {'f': 1}, 1: {'o': 1, 'i': 1}, 2: {'u': 1, 'v': 1}, 3: {'r': 1, 'e': 1} } } """ def setup(words): wordsDict = {} for word in words: wordLength = len(word) try: wordsDict[wordLength] except: wordsDict[wordLength] = {} for i in range(wordLength): try: wordsDict[wordLength][i] except: wordsDict[wordLength][i] = {} wordsDict[wordLength][i][word[i]] = 1 try: wordsDict[wordLength][i][word[i]] except: wordsDict[wordLength][i][word[i]] = 1 return wordsDict def is_match(wordsDict, query): queryLength = len(query) try: wordsDict[queryLength] for i in range(len(query)): if query[i] == ".": continue else: try: wordsDict[queryLength][i][query[i]] except: return False return True except: return False print(is_match(setup(["cat","bat","rat","drat"]),"..t"))

Anonymous on

1

def isMatch(word, exp): if type(word) == list: return list(map(lambda x: isMatch(x, exp), word)) elif type(word) == str: lword = len(word) lexp = len(exp) if lword != lexp: return False for i, n in enumerate(exp): if n == '.': pass elif n != word[i]: return False return True

Anonymous on

1

def spellchecker(word_list,spell): # if spell in word_list: # return True k=False for word in word_list: if len(spell)==len(word): k= all(spell[i]==word[i] or spell[i]=='.' for i in range(len(spell))) return k w=['cat', 'bat', 'rat', 'drat', 'dart', 'drab'] r='d.a.' print(spellchecker(w,r))

Anonymous on

1

def isMatch(query, input_list): query = query.strip() tmp_list = [] is_match = False # First case, word in list if query in input_list: return True elif len(query) == 0 or not query: # Query is empty or None return False else: # Loop through input list for word in input_list: if len(word) == len(query): # Determine if its a match if all(q==c or q=='.' for q,c in zip(query,word)): return True return False if __name__ == '__main__': query = 'd..t' words = ['cat', 'bat', 'rat', 'drat', 'dart', 'drab'] isMatch(query, words)

Anonymous on

0

def isMatch(query, input_list): query = query.strip() # First case, word in list if query in input_list: return True elif len(query) == 0 or not query: # Query is empty or None return False else: # Loop through input list for word in input_list: if len(word) == len(query): # Determine if it can be a match if all(q==c or q=='.' for q,c in zip(query,word)): return True return False

_if on

0

def isMatch(pattern, input_list): pattern = pattern.strip() is_match = False patternLen = len(pattern) if pattern in input_list: return True elif patternLen == 0 or not pattern: return False else: for word in input_list: if is_match: break for i, c in enumerate(word[:-patternLen+1]): if pattern[0] == '.' or pattern[0] == c: if all(y=='.' or y==x for x,y in zip(word[i:],pattern)): is_match = True break return is_match

Anonymous on

0

class Spellcheck def initialize(wordlist) @words = wordlist @lengths = wordlist.map { |w| w.length }.uniq end def check(word) return false if not @lengths.include? word.length @words.each do |w| w.split('').each_with_index do |letter, i| if word[i] == '.' or word[i] == letter return true if w.length - 1 == i else break end end end false end end sc = Spellcheck.new %w[cat bat rat drat dart drab]; sc.check('c.t')

iwin on

0

from collections import Counter def isMatch(str1): arr1=['cat', 'bat', 'rat', 'drat', 'dart', 'drab'] arr_hash= {i:len(i) for i in arr1} arr_len=set(arr_hash.values()) if len(str1) not in arr_len: return False if str1 in arr_hash : return True match_arr=[x for x in arr_hash.keys() if arr_hash[x]==len(str1)] found=False for i in range(len(str1)): if str1[i]==".": found=True else: found=False for j in range(len(match_arr)): if match_arr[j][i]==str1[i]: found=True if found==False: return False return found assert isMatch('cat')==True assert isMatch('c.t')==True assert isMatch('l.t')==False assert isMatch('.at')==True assert isMatch('..t')==True assert isMatch('..k')==False assert isMatch('d..t')==True assert isMatch('dr..')==True assert isMatch('...')==True assert isMatch('....')==True assert isMatch('.....')==False print("passed")

Anonymous on

0

def setUp(word, dict): arr_dict = [] for w in dict: if len(w) == len(word): arr_dict.append(w) for x in arr_dict: cnt=0 for j in range(len(word)): print(word[j]) if (x[j] == word[j]) or (word[j]=='.'): cnt = cnt+1 if cnt == len(word): break return cnt def isMatch(word, dict): return setUp(word, dict) == len(word)

Anonymous on

0

public class Program { List wordList = new List {"cat","bat","rat","drat"}; public static void Main(string[] args) { var word = "..r"; Program pgm = new Program(); var result = pgm.wordChecker(word); result.Dump(); } private bool wordChecker(string word) { foreach (var wList in wordList) { if (wList.Length != word.Length) continue; for (int i = 0; i < wList.Length; i++) { if (word[i] == '.') continue; if (word[i] != wList[i]) return false; } return true; } return false; } }

Anonymous on

0

class solution : def __init__(self): self.dictionarWords = collections.defauldic(set) def add word (self, word) : seld.dictionarWords[len(word)].add(word) def search(self, word): m = len(word) for words in dictionaryWords[m]: i = 0 while i < m and (words[i] == word[i] or word[i] == '.') : i +=1 if i == m : return True return False

Simple answer on

0

from collections import defaultdict class SpellCheck: def __init__(self): self.wordDict = defaultdict(set) self.max = 0 self.min = 0 def setUp(self, wordList): self.min = len(wordList[0]) for word in wordList: if len(word) > self.max: self.max = len(word) if len(word) self.max or checkLength < self.min: return False for index, char in enumerate(checkWord): if (char not in self.wordDict or index not in self.wordDict[char]) and char != '.': return False return True if __name__ == "__main__": speller = SpellCheck() speller.setUp(['cat', 'bat', 'rat', 'drat', 'dart', 'drab']) checks = ['act','c.T','.at','..t','d..t','dr..','...','....', '.....', 'h.t', 'c', '.'] for word in checks: print(speller.isMatch(word))

Anonymous on

0

#Only quick optimization I can think of is to build a dictionary by length of word and divide the search space. from collections import defaultdict def process_dict(words): wordDict = defaultdict(list) for word in words: wordDict[len(word)].append(word) return wordDict def is_match(wordDict, match): if len(match) not in wordDict: return False for word in wordDict[len(match)]: #print(word) for i in range(len(match)): #print(i) if match[i] == "." or match[i] == word[i]: continue else: break else: return True return False #Tests words = ["cat", "bat", "rat", "drat", "dart", "drab"] myDict = process_dict(words) print(is_match(myDict, '..t')) print(is_match(myDict, 'fat')) print(is_match(myDict, 'drat')) print(is_match(myDict, 'c.t')) print(is_match(myDict, '...'))

anan on

0

a = ['cat','bat','rat','drat','drart','drab'] len_a = len(a) b=input("Please enter a string: ") len_b = len(b) hard_stop = 0 cnt = 0 for i in range (len_a): c=a[i] if (len(a[i]) == len_b): for j in range (len(c)): if (c[j] == b[j]): cnt = cnt + 1 elif (b[j] == '.'): cnt = cnt + 1 else: break if (cnt > 0 ): print("True") else: print("False")

Anonymous on

0

/* Describe the dictionary as a tree so that the number of character comparisons is minimised. */ #include #include #include #include typedef std::pair DictionaryKey; typedef std::multimap Dictionary; struct Node { std::map children; bool terminates; Node() : children(), terminates() {} ~Node() { std::for_each(children.begin(), children.end(), [](auto const& e) { delete e.second; }); } void add(char const* s) { BOOST_ASSERT(*s != '.'); if (*s) { auto found = children.find(*s); if (found == children.end()) { found = children.insert(std::make_pair(*s, new Node)).first; } found->second->add(++s); } else { terminates = true; } } bool isMatch(char const* s) const { if (!*s) return terminates; if ('.' == *s) { for (auto i = children.begin(), e = children.end(); i != e; ++i) { if (i->second->isMatch(++s)) return true; } } else { auto found = children.find(*s); if (found != children.end() && found->second->isMatch(++s)) return true; } return false; } }; std::unique_ptr setUp() { char const* words[] = { "cat", "bat", "rat", "drat", "dart", "drab" }; std::unique_ptr node(new Node); for (char const** i = words, ** e = &words[6]; i != e; ++i) { node->add(*i); } return node; } bool isMatch(std::unique_ptr const& s, char const* q) { return s->isMatch(q); }

James Taylor on

0

Can you do this in 5-10 min?

Anonymous on

0

import re class Test1: def set_up(self): self.word_list = ['cat', 'bat', 'rat', 'drat', 'dart', 'drab'] def is_match(self, query): for word in self.word_list: if bool(re.fullmatch(rf'{query}', word)): return True return False if __name__ == '__main__': queries = ['cat', 'c.t', '.at', '..t', 'd..t', '.....', 'h.t', 'c.'] test1 = Test1() test1.set_up() for q in queries: print(f'{q} -> {test1.is_match(q)}')

No way I can do this in < 15 min and without looking up regex syntax on

0

The key in these questions is to cover the fundamentals, and be ready for the back-and-forth with the interviewer. Might be worth doing a mock interview with one of the Facebook or ex-Facebook Data Engineer experts on Prepfully? They give real-world practice and guidance, which is pretty helpful. prepfully.com/practice-interviews

Anonymous on

0

def setUp(): words = list(input().strip().split()) check = input() return words, check def isMatch(words_list, test): flag = False for word in words_list: flag = True for i in range(min(len(word), len(test))): if test[i] != "." and test[i] != word[i]: flag = False break if flag: return flag return flag if __name__ == '__main__': words, check = setUp() print(isMatch(words, check))

Chandan on

1

Seriously, is this a question to be solved in 5 min? How did you solve it?

Anonymous on

1

Are we allowed to use Regex?

Anonymous on

4

how did you try this? Using TRIE?

Anonymous on

Add Answers or Comments

To comment on this, Sign In or Sign Up.