For the past three years, I have been working in developing generative AI solutions for real-world problems. I have worked in CFILT Lab, IIT Bombay under Prof. Pushpak Bhattacharyya in NLP. We developed research solutions to industrial challenges, published papers and built practical applications.
Before that, I worked in CIRE, Kolkata under Late Dr. Ashesh Nandy, Prof. Subhas C. Basak and Dr. Smarajit Manna in Computational Biology and Bioinformatics. This group was responsible for one of the early contributions in graphical sequence characterization in genetics. We published 10 research items in this period of 4 years.
Ongoing Research
Advanced Language Models for Low-Resource Languages
Multimodal Learning for Healthcare Applications
Genomic Mutation Prediction using Language Modelling
Table-to-text generation, a long-standing challenge in natural language generation, has remained unexplored through the lens of subjectivity. Subjectivity here encompasses the comprehension of information derived from the table that cannot be described solely by objective data. Given the absence of pre-existing datasets, we introduce the Ta2TS dataset with 3849 data instances. We perform the task of fine-tuning sequence-to-sequence models on the linearized tables and prompting on popular large language models. We analyze the results from a quantitative and qualitative perspective to ensure the capture of subjectivity and factual consistency. The analysis shows the fine-tuned LMs can perform close to the prompted LLMs. Both the models can capture the tabular data, generating texts with 85.15% BERTScore and 26.28% Meteor score. To the best of our knowledge, we provide the first-of-its-kind dataset on tables with multiple genres and subjectivity included and present the first comprehensive analysis and comparison of different LLM performances on this task.
Identification and Computational Analysis of Mutations in SARS-CoV-2
Tathagata Dey, Shreyans Chatterjee, et al.Elsevier, Computers in Biology and Medicine,
Volume 129,
2021
SARS-CoV-2 infection has become a worldwide pandemic and is spreading rapidly to people across the globe. To combat the situation, vaccine design is the essential solution. Mutation in the virus genome plays an important role in limiting the working life of a vaccine. In this study, we have identified several mutated clusters in the structural proteins of the virus through our novel 2D Polar plot and
qR characterization descriptor. We have also studied several biochemical properties of the proteins to explore the dynamics of evolution of these mutations. This study would be helpful to understand further new mutations in the virus and would facilitate the process of designing a sustainable vaccine against the deadly virus.
Applications of alignment-free sequence descriptors in the characterization of sequences in the age of big data: a case study with Zika virus, SARS, MERS, and COVID-19
Dwaipayan Sen, Tathagata Dey et al.Big Data Analytics in Chemoinformatics and Bioinformatics,
Elsevier,
2023
The Big Data problem is the computational challenge to deal with a humongous volume of information. With the advent of next-gen sequencing technologies and other ways, a huge amount of data are collected and stored every day. To process these data and take out fruitful information, mathematical descriptors are alone not sufficient enough. So, this chapter focuses on collaborating the bioinformatic concepts of alignment-free sequence descriptor with Big Data architecture to find out approachable solution to the problem.
Identification of Generalized Peptide Regions for Designing Vaccine Effective for All Significant Mutated Strains of SARS-CoV-2
S. Biswas, S. Manna, et al.Combinatorial Chemistry & High Throughput Screening, Bentham Science, 2021
Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 has become a worldwide pandemic and created an utmost crisis across the globe. To mitigate the crisis, the design of vaccines is a crucial solution. The frequent mutation of the virus demands generalized vaccine candidates, which would be effective for all mutated strains at present and for the strains that would evolve due to further new mutations in the virus. Objective: The objective of this study is to identify more frequently occurring mutated variants of SARS-CoV-2 and to suggest peptide vaccine candidates effective in common against the viral strains considered. Method: In this study, we have identified all currently prevailing mutated strains of SARS-CoV-2 through 2D Polar plot and Quotient Radius (qR) characterization descriptor. Then, by considering the top eight mutation strains, which are significant due to their frequency of occurrence, peptide regions suitable for vaccine design have been identified with the help of a mathematical model – 2D Polygon Representation, followed by the evaluation of epitope potential and ensuring that there is no case of any autoimmune threat. Lastly, in order to verify whether this entire approach is applicable for vaccine design against any other virus in general, we have made a comparative study between the peptide vaccine candidates prescribed for the Zika virus using the current approach and a list of potential vaccine candidates for the same already established in the past. Results: We have finally suggested three generalized peptide regions which would be suitable as sustainable peptide vaccine candidates against SARS-CoV-2 irrespective of its currently prevailing strains as well any other variant of the same that may appear in the future. We also observed that during the comparative study using the case of E protein of Zika virus, the peptide regions suggested using the new approach matched with the already established results. Conclusion The study, therefore, illustrates an approach that would help in developing peptide vaccine against SARS-CoV-2 by suggesting those peptide regions which can be targeted irrespective of any mutated form of this virus. The consistency with which this entire approach was also able to figure out similar vaccine candidates for Zika virus with utmost accuracy proves that this protocol can be extended for peptide vaccine design against any other virus in the future.
Cluster analysis of coronavirus sequences using computational sequence descriptors: With applications to SARS, MERS and SARS-CoV-2 (CoVID-19)
M. Vracko, S.C. Basak, et al.Combinatorial Chemistry & High Throughput Screening, Bentham Science, 2021
Coronaviruses comprise a group of enveloped, positive-sense single-stranded RNA viruses that infect humans as well as a wide range of animals. The study was performed on a set of 573 sequences belonging to SARS, MERS and SARS-CoV-2 (CoVID-19) viruses. The sequences were represented with alignment-free sequence descriptors and analyzed with different chemometric methods: Euclidean/Mahalanobis distances, principal component analysis and self-organizing maps (Kohonen networks). We report the cluster structures of the data. The sequences are well-clustered regarding the type of virus; however, some of them show the tendency to belong to more than one virus type. Background: This is a study of 573 genome sequences belonging to SARS, MERS and SARS-- CoV-2 (CoVID-19) coronaviruses. Objectives: The aim was to compare the virus sequences, which originate from different places around the world. Methods: The study used alignment free sequence descriptors for the representation of sequences and chemometric methods for analyzing clusters. Results: Majority of genome sequences are clustered with respect to the virus type, but some of them are outliers. Conclusion: We indicate 71 sequences, which tend to belong to more than one cluster.
Shreyans Chatterjee, Tathagata Dey, et al.ournal of Bioinformatics and Systems Biology 3 (2020): 081-091.
SARS-CoV-2 pandemic starting from Wuhan, China has now been spreading worldwide making the infection count more than 41 million. Within a short time span, many mutations are continuously occurring in the viral genome, be it point mutation or frameshift mutation. Scientists have been suggesting that, one of those numerous point mutations is becoming prevalent by replacing all the initial Wuhan strains of SARS-CoV-2. In this work, we have conducted a rigorous bio-informatic analyses and compared the properties of wild and mutant strains to find out the changes. Eventually, it is considered to be a more pathogenic and infective strain by our theoretical reports with a change in amino acid position number 614, which coincidentally converges with one or few publications mentioning emergence of new pathogenic D614G strain. Here we describe our approach to arrive at the conclusion.
Novel Algorithms for In Silico Peptide Vaccine Design with Reference to Ebola Virus
S. Biswas, T. Dey, et al.2020 IEEE International Conference on Computer, Electrical & Communication Engineering (ICCECE)At: Kolkata, India/span>
Viral epidemics have posed a problem for quick development of drugs and vaccines to control the menace. A case in point is the Ebola viral disease with high fatality ratio in Africa. It is making a comeback in the Democratic Republic of Congo (DRC), after its rampage in West Africa in 2014-16 that has spawned fears of leading to a pandemic. Vaccines such as the experimental rVSV-ZEBOV has provided protection in 70-80% of the cases, but such vaccines are in short supply and doubts exist of its availability and sustainability in pandemic cases. Peptide vaccines promise to amend this lacuna as a chemical construct that can be scaled up to requirement in manufacturing set-up, are easy to produce in pure form and store as well as transport much more easily and economically than traditional vaccines. Although no peptide vaccines have been licensed yet for human use, the rapid growth of applications of in silico approaches to peptide vaccine design and application to a myriad of virus infections, and subsequent follow-up experimental work, have led to expectations of licensures in the near future. We have proposed a protocol to automate the search procedure using mathematical and computational modelling approaches to generate peptide libraries that promote long life of such vaccines even in the face of rapid mutational changes in the viral sequences. In this paper, we outline the mathematical model we have used and the recent improvements in the techniques to ensure the best recommendations for peptide vaccine libraries, especially against the Ebola virus that threatens to spill over the Congo border and cause epidemics and pandemics in a globalized world.
New Computational Analysis to Identify the Mutational Changes in SARS-CoV-2
Tathagata Dey, Shreyans Chatterjee, et al.MOL2NET, USINEWS-04: US-IN-EU Workshop Series, 2020 At: UMN, Duluth, USA, Volume: 6
The ongoing rapid spread of COVID-19 disease from its first detection in Wuhan, China in late 2019 was declared a pandemic by World Health Organization on 11th March, 2020. It is believed that to combat this deadly virus, now designated as SARS-CoV-2, designing and developing a proper vaccine is the best solution. For developing a sustainable vaccine against this virus, one should have a proper understanding of the mutational changes occurring constantly in its genome and also about the variations that may arise in different communities. Here, we report an algorithm to identify and characterize the mutational changes in the COVID-19 sequences isolated from different countries. The patterns in mutation along with the demographic analysis shown here can be very effective for community specific vaccine designing in the future.
… Read more
2D Polar Co-ordinate Representation of Amino Acid Sequences With some applications to Ebola virus, SARS and SARS-CoV-2 (COVID-19)
Tathagata Dey, Subhamoy Biswas, et al.MOL2NET, USINEWS-04: US-IN-EU Workshop Series, 2020 At: UMN, Duluth, USA, Volume: 6
We consider a novel approach to mathematically define a graphing method to represent amino acid sequences of proteins in two-dimensional plane and characterize them numerically. The amino acids are represented by their relative magnitude of their hydrophobicity.
In Silico Approach for Peptide Vaccine Design for CoVID 19
S. Biswas, S. Chatterjee, et al.MOL2NET, USINEWS-04: US-IN-EU Workshop Series, 2020 At: UMN, Duluth, USA, Volume: 5
The currently surging SARS-COV-2 (or CoVID-19) is challenging the public health authorities worldwide. As of now there is no approved vaccine or drug available for the control of the viral disease. Therefore, non-pharmaceutical interventions (NPIs) are being used around the world to manage the spread of CoVID-19. In this article we used a computer-assisted vaccine design (CAVD) approach to develop a set of most probable peptide vaccine candidates which can be tested for their efficacy by wet lab experiments.