Data scientists look for new ways to analyze "big data" to extract new types of information from said data. This could be in fields such as:
-epidemiology
-physics
-population dynamics
-biology/biochemistry/genetics
-economics (trying to predict the stock market)
So we might take Census data on populations and try to generate new methods to predict trends, such as which cities will have the highest percentage population of elderly in 20-30 years, which might indicate that more hospital beds will be needed. We use established techniques like hierarchical clustering, PCA, t-SNE plots, and of course statistics, to try and draw these conclusions and estimate our certainty. Sometimes entirely new algorithms are developed, but most of the time it's just making improvements on existing methods.
Another big problem we often encounter is how to deal with missing data for some of your points. Do you eliminate them? Do you impute the data?
And then the big one is harmonizing different datasets. In biology, one big problem is how to interpret things like RNA-seq, ChIP-seq, and mass spectrometry. RNA-seq measures transcription and transcript expression, ChIP-seq measures transcription factor binding at promoters, and mass spectrometry measures protein levels (not for all proteins, but for some isoforms). Can we integrate the 3 types of data to draft Gene Regulatory Networks (GRNs)?
Basically, we develop new techniques to harmonize, filter, and analyze data, but we also very often do more derivative work such as simply improving established methods.