Tathagata Dey - Research

Papers & Book Chapters

Facts-and-Feelings: Capturing both Objectivity and Subjectivity in Table-to-Text Generation

Tathagata Dey, Pushpak Bhattacharya arXiv preprint arXiv:2406.10560 (2024)

Table-to-text generation, a long-standing challenge in natural language generation, has remained unexplored through the lens of subjectivity. Subjectivity here encompasses the comprehension of information derived from the table that cannot be described solely by objective data. Given the absence of pre-existing datasets, we introduce the Ta2TS dataset with 3849 data instances. We perform the task of fine-tuning sequence-to-sequence models on the linearized tables and prompting on popular large language models. We analyze the results from a quantitative and qualitative perspective to ensure the capture of subjectivity and factual consistency. The analysis shows the fine-tuned LMs can perform close to the prompted LLMs. Both the models can capture the tabular data, generating texts with 85.15% BERTScore and 26.28% Meteor score. To the best of our knowledge, we provide the first-of-its-kind dataset on tables with multiple genres and subjectivity included and present the first comprehensive analysis and comparison of different LLM performances on this task.

Read Paper ResearchGate

Identification and Computational Analysis of Mutations in SARS-CoV-2

Tathagata Dey, Shreyans Chatterjee, et al. Elsevier, Computers in Biology and Medicine, Volume 129, 2021

SARS-CoV-2 infection has become a worldwide pandemic and is spreading rapidly to people across the globe. To combat the situation, vaccine design is the essential solution. Mutation in the virus genome plays an important role in limiting the working life of a vaccine. In this study, we have identified several mutated clusters in the structural proteins of the virus through our novel 2D Polar plot and qR characterization descriptor. We have also studied several biochemical properties of the proteins to explore the dynamics of evolution of these mutations. This study would be helpful to understand further new mutations in the virus and would facilitate the process of designing a sustainable vaccine against the deadly virus.

Read Paper ResearchGate

Applications of alignment-free sequence descriptors in the characterization of sequences in the age of big data: a case study with Zika virus, SARS, MERS, and COVID-19

Dwaipayan Sen, Tathagata Dey et al. Big Data Analytics in Chemoinformatics and Bioinformatics, Elsevier, 2023

The Big Data problem is the computational challenge to deal with a humongous volume of information. With the advent of next-gen sequencing technologies and other ways, a huge amount of data are collected and stored every day. To process these data and take out fruitful information, mathematical descriptors are alone not sufficient enough. So, this chapter focuses on collaborating the bioinformatic concepts of alignment-free sequence descriptor with Big Data architecture to find out approachable solution to the problem.

Read Paper ResearchGate

Identification of Generalized Peptide Regions for Designing Vaccine Effective for All Significant Mutated Strains of SARS-CoV-2

S. Biswas, S. Manna, et al. Combinatorial Chemistry & High Throughput Screening, Bentham Science, 2021

Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 has become a worldwide pandemic and created an utmost crisis across the globe. To mitigate the crisis, the design of vaccines is a crucial solution. The frequent mutation of the virus demands generalized vaccine candidates, which would be effective for all mutated strains at present and for the strains that would evolve due to further new mutations in the virus. Objective: The objective of this study is to identify more frequently occurring mutated variants of SARS-CoV-2 and to suggest peptide vaccine candidates effective in common against the viral strains considered. Method: In this study, we have identified all currently prevailing mutated strains of SARS-CoV-2 through 2D Polar plot and Quotient Radius (qR) characterization descriptor. Then, by considering the top eight mutation strains, which are significant due to their frequency of occurrence, peptide regions suitable for vaccine design have been identified with the help of a mathematical model – 2D Polygon Representation, followed by the evaluation of epitope potential and ensuring that there is no case of any autoimmune threat. Lastly, in order to verify whether this entire approach is applicable for vaccine design against any other virus in general, we have made a comparative study between the peptide vaccine candidates prescribed for the Zika virus using the current approach and a list of potential vaccine candidates for the same already established in the past. Results: We have finally suggested three generalized peptide regions which would be suitable as sustainable peptide vaccine candidates against SARS-CoV-2 irrespective of its currently prevailing strains as well any other variant of the same that may appear in the future. We also observed that during the comparative study using the case of E protein of Zika virus, the peptide regions suggested using the new approach matched with the already established results. Conclusion The study, therefore, illustrates an approach that would help in developing peptide vaccine against SARS-CoV-2 by suggesting those peptide regions which can be targeted irrespective of any mutated form of this virus. The consistency with which this entire approach was also able to figure out similar vaccine candidates for Zika virus with utmost accuracy proves that this protocol can be extended for peptide vaccine design against any other virus in the future.

Read Paper ResearchGate

Cluster analysis of coronavirus sequences using computational sequence descriptors: With applications to SARS, MERS and SARS-CoV-2 (CoVID-19)

M. Vracko, S.C. Basak, et al. Combinatorial Chemistry & High Throughput Screening, Bentham Science, 2021

Coronaviruses comprise a group of enveloped, positive-sense single-stranded RNA viruses that infect humans as well as a wide range of animals. The study was performed on a set of 573 sequences belonging to SARS, MERS and SARS-CoV-2 (CoVID-19) viruses. The sequences were represented with alignment-free sequence descriptors and analyzed with different chemometric methods: Euclidean/Mahalanobis distances, principal component analysis and self-organizing maps (Kohonen networks). We report the cluster structures of the data. The sequences are well-clustered regarding the type of virus; however, some of them show the tendency to belong to more than one virus type. Background: This is a study of 573 genome sequences belonging to SARS, MERS and SARS-- CoV-2 (CoVID-19) coronaviruses. Objectives: The aim was to compare the virus sequences, which originate from different places around the world. Methods: The study used alignment free sequence descriptors for the representation of sequences and chemometric methods for analyzing clusters. Results: Majority of genome sequences are clustered with respect to the virus type, but some of them are outliers. Conclusion: We indicate 71 sequences, which tend to belong to more than one cluster.

Read Paper ResearchGate

Emergence of a Pathogenic Strain of COVID-19

Shreyans Chatterjee, Tathagata Dey, et al. ournal of Bioinformatics and Systems Biology 3 (2020): 081-091.

SARS-CoV-2 pandemic starting from Wuhan, China has now been spreading worldwide making the infection count more than 41 million. Within a short time span, many mutations are continuously occurring in the viral genome, be it point mutation or frameshift mutation. Scientists have been suggesting that, one of those numerous point mutations is becoming prevalent by replacing all the initial Wuhan strains of SARS-CoV-2. In this work, we have conducted a rigorous bio-informatic analyses and compared the properties of wild and mutant strains to find out the changes. Eventually, it is considered to be a more pathogenic and infective strain by our theoretical reports with a change in amino acid position number 614, which coincidentally converges with one or few publications mentioning emergence of new pathogenic D614G strain. Here we describe our approach to arrive at the conclusion.

Read Paper ResearchGate

Novel Algorithms for In Silico Peptide Vaccine Design with Reference to Ebola Virus

S. Biswas, T. Dey, et al. 2020 IEEE International Conference on Computer, Electrical & Communication Engineering (ICCECE)At: Kolkata, India/span>

Viral epidemics have posed a problem for quick development of drugs and vaccines to control the menace. A case in point is the Ebola viral disease with high fatality ratio in Africa. It is making a comeback in the Democratic Republic of Congo (DRC), after its rampage in West Africa in 2014-16 that has spawned fears of leading to a pandemic. Vaccines such as the experimental rVSV-ZEBOV has provided protection in 70-80% of the cases, but such vaccines are in short supply and doubts exist of its availability and sustainability in pandemic cases. Peptide vaccines promise to amend this lacuna as a chemical construct that can be scaled up to requirement in manufacturing set-up, are easy to produce in pure form and store as well as transport much more easily and economically than traditional vaccines. Although no peptide vaccines have been licensed yet for human use, the rapid growth of applications of in silico approaches to peptide vaccine design and application to a myriad of virus infections, and subsequent follow-up experimental work, have led to expectations of licensures in the near future. We have proposed a protocol to automate the search procedure using mathematical and computational modelling approaches to generate peptide libraries that promote long life of such vaccines even in the face of rapid mutational changes in the viral sequences. In this paper, we outline the mathematical model we have used and the recent improvements in the techniques to ensure the best recommendations for peptide vaccine libraries, especially against the Ebola virus that threatens to spill over the Congo border and cause epidemics and pandemics in a globalized world.

Read Paper ResearchGate

New Computational Analysis to Identify the Mutational Changes in SARS-CoV-2

Tathagata Dey, Shreyans Chatterjee, et al. MOL2NET, USINEWS-04: US-IN-EU Workshop Series, 2020 At: UMN, Duluth, USA, Volume: 6

The ongoing rapid spread of COVID-19 disease from its first detection in Wuhan, China in late 2019 was declared a pandemic by World Health Organization on 11th March, 2020. It is believed that to combat this deadly virus, now designated as SARS-CoV-2, designing and developing a proper vaccine is the best solution. For developing a sustainable vaccine against this virus, one should have a proper understanding of the mutational changes occurring constantly in its genome and also about the variations that may arise in different communities. Here, we report an algorithm to identify and characterize the mutational changes in the COVID-19 sequences isolated from different countries. The patterns in mutation along with the demographic analysis shown here can be very effective for community specific vaccine designing in the future. … Read more

Read Paper ResearchGate

2D Polar Co-ordinate Representation of Amino Acid Sequences With some applications to Ebola virus, SARS and SARS-CoV-2 (COVID-19)

Tathagata Dey, Subhamoy Biswas, et al. MOL2NET, USINEWS-04: US-IN-EU Workshop Series, 2020 At: UMN, Duluth, USA, Volume: 6

We consider a novel approach to mathematically define a graphing method to represent amino acid sequences of proteins in two-dimensional plane and characterize them numerically. The amino acids are represented by their relative magnitude of their hydrophobicity.

Read Paper ResearchGate

In Silico Approach for Peptide Vaccine Design for CoVID 19

S. Biswas, S. Chatterjee, et al. MOL2NET, USINEWS-04: US-IN-EU Workshop Series, 2020 At: UMN, Duluth, USA, Volume: 5

The currently surging SARS-COV-2 (or CoVID-19) is challenging the public health authorities worldwide. As of now there is no approved vaccine or drug available for the control of the viral disease. Therefore, non-pharmaceutical interventions (NPIs) are being used around the world to manage the spread of CoVID-19. In this article we used a computer-assisted vaccine design (CAVD) approach to develop a set of most probable peptide vaccine candidates which can be tested for their efficacy by wet lab experiments.

Read Paper ResearchGate

A novel approach to Peptide Vaccine Design for Ebola virus

S. Biswas, T. Dey, et al. MOL2NET, USINEWS-04: US-IN-EU Workshop Series, 2019 At: UMN, Duluth, USA, Volume: 5

In this paper, we have described the algorithms based on mathematical and computational modelling, that are essential for designing peptide vaccines

Read Paper ResearchGate

Talks

Novel Algorithms for In Silico Peptide Vaccine Design with reference to Ebola Virus

2020

IEEE International Conference on Computer, Electrical and Kolkata, India

View Presentation

A preliminary analysis and prediction method of mutations in different viruses using mathematical 𝒒𝑹 characterization and pattern recognition

2021

International Symposium on Drug Design and Development Research Global Institute of Pharmaceutical Education and Research, Kashipur, India

View Presentation

Computer-aided new approach for the surveillance and peptide vaccine design for emerging global pathogens like Zika and Covid

2021

Vaccine Summit, Ohio Ohio, USA

View Presentation

Computational Analysis and Peptide Vaccines As New Materials Design of Anti-Virus Therapeutics and Prophylactics

2021

Online FDP on “ Advanced Materials & Engineering” College of Engineering & Management, Kolaghat Kolkata, India

View Presentation

An Improved Methodology for Peptide Vaccine design with reference to Ebola Virus

2019

Impromptu Seminar on Discrete Mathematical Modeling in Physics, Kolkata, India

View Presentation

Posters

Enhancing Table-to-Text Generation: A Focus on Subjectivity and Objectivity with the TaTS Dataset

2024

Research and Innovation Symposium in Computing IIT Bombay, Mumbai, India

View Poster

2D Polar Co-Ordinate Representation of Amino Acid Sequences

2020

National Conference on Frontiers in Modern Physics Kolkata, India

View Poster

A novel Algorithm for Peptide Vaccine Design with reference to Ebola Virus

2019

MOL2NET 2019 Duluth, USA

View Poster

Research Journey

Ongoing Research

Papers & Book Chapters

Facts-and-Feelings: Capturing both Objectivity and Subjectivity in Table-to-Text Generation

Identification and Computational Analysis of Mutations in SARS-CoV-2

Applications of alignment-free sequence descriptors in the characterization of sequences in the age of big data: a case study with Zika virus, SARS, MERS, and COVID-19

Identification of Generalized Peptide Regions for Designing Vaccine Effective for All Significant Mutated Strains of SARS-CoV-2

Cluster analysis of coronavirus sequences using computational sequence descriptors: With applications to SARS, MERS and SARS-CoV-2 (CoVID-19)

Emergence of a Pathogenic Strain of COVID-19

Novel Algorithms for In Silico Peptide Vaccine Design with Reference to Ebola Virus

New Computational Analysis to Identify the Mutational Changes in SARS-CoV-2

2D Polar Co-ordinate Representation of Amino Acid Sequences With some applications to Ebola virus, SARS and SARS-CoV-2 (COVID-19)

In Silico Approach for Peptide Vaccine Design for CoVID 19

A novel approach to Peptide Vaccine Design for Ebola virus

Talks

Novel Algorithms for In Silico Peptide Vaccine Design with reference to Ebola Virus

A preliminary analysis and prediction method of mutations in different viruses using mathematical 𝒒𝑹 characterization and pattern recognition

Computer-aided new approach for the surveillance and peptide vaccine design for emerging global pathogens like Zika and Covid

Computational Analysis and Peptide Vaccines As New Materials Design of Anti-Virus Therapeutics and Prophylactics

An Improved Methodology for Peptide Vaccine design with reference to Ebola Virus

Posters

Enhancing Table-to-Text Generation: A Focus on Subjectivity and Objectivity with the TaTS Dataset

2D Polar Co-Ordinate Representation of Amino Acid Sequences

A novel Algorithm for Peptide Vaccine Design with reference to Ebola Virus