What is Instant Question Answering? ● User asks a question in text format and the instantQA system automatically retrieves or formulates an answer and presents it back to the user, instantly.
Why Instant Question Answering? ● In spite of the continuous progress of search engines, many of users’ needs still remain unanswered. ● While Community Question Answering (e.g. AnA platform) can feature factoid questions but their primary goal is to satisfy needs such as: Opinion seeking, Recommendation, Open-ended questions, Problem solving. ● In community question answering user has to wait for answers which he seeks, even if question is very simple and a mere fact. ● Better User Experience : Why browse through search result listings or related questions when information can be catered upfront.
Why Instant Question Answering? ● CASE : SHIKSHA.COM ● Top domains being searched based on Both query logs and data availability with listings: fees, duration, seats, application date, application url, affiliation, approval, entrance exams, placement companies and job salaries. ● High number of Fact type questions, which can be targeted, although we are not targeting opinion based or open ended questions. ● 23% of questions belong to these 10 domains out of 1.15L random sample.
Is it something similar to AnA platform? ● Our organization have a discussion forum called as AnA(Ask and Answer) platform. ● InstantQA has no relation what so ever and no direct usecase with the current AnA forum contents, as of now.
What kind of questions we target? ● What is the price of X? ● What is meaning of life, universe and everything? ● When is the last date of Y? ● I do not feel like studying, what ● How much is the fee for W? to do? ● What is the fee for W? ● Wil I get admission in Z? ● Which company hire from ● How to improve my career? campus Q? ● Should I invest in noida? ● How is the placement at Z? ● I have purchased X project, ● Is Z college in Delhi? (transform should I sel it now or hold? to where) ● Is it beneficial to buy 2bhk in 30 lacs?
What kind of questions we target? ● What is the price of X? ● What is meaning of life, universe and everything? ● When is the last date of Y? ● I do not feel like studying, what ● How much is the fee for W? to do? ● What is the fee for W? ● Wil I get admission in Z? ● Which company hire from ● How to improve my career? campus Q? ● Should I invest in noida? ● How is the placement at Z? FACTOIDS Open ended. Not definite ● I have purchased X project, ● Is Z college in Delhi? (transform should I sel it now or hold? to where) ● Is it beneficial to buy 2bhk in 30 lacs?
What is the very basic approach to instant question answering? ● General architecture question Question Information Answer Classification Retrieval Answe answer and Analysis Extraction r Query=“Calvad /A is : a e.g. Answer: /Q is /A os is” dry Text retrieva /Q is /A: What is where:/Q= l=“…Calvados apple is often used in Calvad “(Calvado “Calvad cooking… brandy os” is ”a os? s)” Calvados is a dry apple dry
brandy made apple in… brandy”
If it is so simple, why haven't you done it already?
There are challenges in QA ! ● Quality of text data. ● Language variability (paraphrase) ● Knowledge base domain: the answer has to be supported by the collection, not by the current state of the world. ● How to locate the information given the question keywords. ● It is unlikely that a system will have all necessary resources pre-computed. ● The task requires some deduction or extra linguistic knowledge. ● How does a reasoning system find relevant pieces of information.
Do we have any prior research to tackle these challeneges?
Ok investigation is done. But how to do it actually?
Knowledge base generation
Knowledge base generation PHASE 1
Knowledge base generation: Example Index Btech, iit d, fees, ● The fees for Btech course in IIT 24000, INR D is 24000 INR. ● The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>. ● Fees, Btech, IIT D, 24000 PHASE 1 ● What is the fees of Btech course at IIT Delhi? ● How much is the fees for Btech Coure from IIT Delhi? ● How many INR is the fees of btech from i t delhi. ● What …......... The fees for Btech course in IIT D is 24000 INR. The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>.
Answer Retreival : Example Already indexed knowledge base. Trained once at startup. How much wil I pay for btech from IIT D? How much wil I Rank and prune <<pay for>> best answer based <<btech>> from on col ective match. <<IIT D>>? Focus: How Much Object : Pay Class: quanitity to pay, fees ● You should pay 24000 INR for Btech from IIT D. ● The fees for Btech from IITD is 24000 INR. Consistency checks ● 24000 INR should be paid for Btech from IIT D.
So many boxes !! Let us check out major components in brief.
A.1. Fact phrase generator from structured listings ● Structured listing to factoid text. ● No need to rely only on user generated sentences. ● Use basic language model techniques to create sentences from templates. <doc> Language Model ….. <college_name>iit</college_name> <college_id>13213</college_id> <fee>54000 inr annual</fee> <location>delhi</location> ….... Fee of iit delhi is 54000 inr annual. </doc>
A.2. Template Generator ● Start with identifying: – Answer Type – Entities in focus – Part of Speech tags ● With these tags and language grammar rules, a factoid/ sentence can be converted into al possible question forms. (Question Generation QG task) Fee of iit delhi is 54000 inr annually. ● What is the fee of iit delhi annually? Answer type: quantity ● focus: fee What is the fee of iit delhi Fee of <II> <LL> is <$$>. ● entity : iit + delhi How much is the fee of iit delhi? Fees of <II> <LL> is <$$>. ● Pos tags etc. Is fee of iit delhi 54000 inr? Cost of <II> <LL> is <$$>.
B.1. Text Preprocessing ● Short-forms – i’m, im, i m i am – can’t, cant, can t can not ● Spelling correction ● Repeated punctuation (!!!, ???, …) ● Smilies ● Salutations (Hi all, Hiya, etc.) ● Names, signature, course codes
B.2. Entity and POS Tagger ● QER – Names, locations etc. ● Part of Speech Tagger using word sequence patterns – Sequence (noun, verbs, auxiliaries, modifiers) ● Phrase Chunker ● Dependency parsing : validate tag relationships
B.3. Question Analysis ● Create features to be used during answer extraction
● Identify keywords to be matched in document sentences ● Identify answer type to match answer candidates. We can create an inventory of questions and expected answer types and so we can train a classifier – Quantity? – Dates? – Definition? ● Select a list of useful patterns from a pattern repository ● Identify question relations which may be used for sentence analysis, etc.
B.4. Query Formulation ● The question needs to be transformed in a query to the document retrieval system ● Each IR system has its own query language so we need to perform this mapping ● Identify useful keywords; use type of answer sought, entities to boost etc. ● Query Creation : Ordered terms, combined terms, weighted terms.
B.5. Answer Candidate Searcher ● Index the <question, qtypes, entities, answer template> in a training corpus ● Retrieve set of n <question, qtypes, entities, answer template> given a new question ● Decide based on the scores of answers returned the best answer to the new question
Where do we need Natural Language Processing? ● Tokenisation (words, numbers, punctuation, whitespace) ● Sentence detection ● Part of speech tagging (verbs, nouns, pronouns, etc.) ● Query entity recognition ● Chunking/Parsing (noun/verb phrases and relationships) ● Statistical modelling tools ● Dictionaries, word-lists, WordNet , VerbNet ● Template generation using grammar rules.
So you are telling me there are readymade nlp tools?
NLP tools problems ● Training data issues ● Training domains are completely different. ● Local english language: slang, spel , localisation ● Sentence detection failures: ● Bad style (capitalisation, punctuation) ● El ipsis (i tried... it failed... error message...) ● Tokenisation failures: ● Multiple punctuation ???, !!! (student emphasis) ● Abbreviations (im, m.b.a, cant, doesnt, etc.) ● POS errors ● Spel ing, grammar ● We need to experiment, modify codes and train on our domain data !
What are the use cases of instant QA ? How does it fit in our system?
Interaction ● If users are not writing good english then try to minimize their writings. We can focus on capturing user intent with least amount of typed text. ✔ Auto complete ✔ Guidance ✔ Spell check ✔ Auto correct ✔ Manual feedback on conflicts ✔ Make them write good queries ● This helps not onle user experience but increases the accuracy of language based statistical systems.
Shiksha : main search & cafe search
Shiksha : Integration with main search auto-suggestor We will already generate good quality questions. Could be intigrated here.
99acres ● Similar use cases like shiksha. ● The real estate domain has more open ended opinion question and very less factoid questions. ● If a single text box search is introduced in future – SRP can cater not only listings but also Question Answers – Instant QA would be real y helpful in user experience.
And many more other use cases …... Plus some components of this system will be utilized separately in improving other existing systems.