Mid-Level Python Interview Questions

Q1:

What are Python descriptors and how are they used?

Mid

Answer

Descriptors define __get__, __set__, and __delete__ methods.
Used for managing attribute access, validation, computed attributes, and reusable logic.
They power @property functionality in Python.

Quick Summary: Descriptors implement __get__, __set__, __delete__ to control attribute access on a class. Used when an attribute needs validation or computation on access. @property is a built-in descriptor. Custom descriptors: class Validator: def __set__(self, obj, value): validate(value); obj.__dict__[name] = value. Used by SQLAlchemy and Django ORM to define column types and relationships.

Permalink

Q2:

What is slots in Python and why use it?

Mid

Answer

__slots__ restricts dynamic attribute creation.
Saves memory by preventing per-instance __dict__.
Useful for memory-sensitive applications or many small objects.

Quick Summary: __slots__ declares a fixed set of attributes for a class, replacing the per-instance __dict__. Benefits: lower memory usage (no dict overhead per instance), slightly faster attribute access, prevents dynamic attribute creation. Use when creating many instances of a small class. Downside: less flexible, can't add new attributes dynamically, inheritance with slots requires care.

Permalink

Q3:

Explain Python metaclasses and use cases.

Mid

Answer

Metaclasses control class creation.
Used for validation, enforcing interfaces, singletons, auto-registration.
Defined by extending type and overriding __new__.

Quick Summary: Metaclass controls class creation. type is the default metaclass. Custom metaclass: class Meta(type): def __new__(mcs, name, bases, attrs): modify attrs, return super().__new__(...). Use cases: auto-register subclasses (plugin systems), validate class definition, add methods automatically (ORMs), enforce coding standards. __init_subclass__ (Python 3.6+) is simpler for most use cases.

Permalink

Q4:

How does Python handle threading and concurrency?

Mid

Answer

threading handles I/O-bound tasks.
GIL limits CPU-bound threads.
multiprocessing enables true parallelism.
concurrent.futures simplifies thread/process pool usage.

Quick Summary: Python threading: threads share memory, good for I/O-bound tasks (network, file I/O). GIL prevents true CPU parallelism. threading.Thread, Lock, RLock, Semaphore, Event, Condition. For CPU-bound tasks use multiprocessing (separate processes, no GIL). concurrent.futures.ThreadPoolExecutor for thread pool. AsyncIO for high-concurrency I/O without threads (event loop, single thread, cooperative multitasking).

Permalink

Q5:

How do you implement asynchronous programming in Python?

Mid

Answer

Use async/await with asyncio.
async def defines async functions.
await pauses execution.
Ideal for I/O-bound workloads like network or DB operations.

Quick Summary: asyncio enables async programming with a single thread and event loop. async def defines a coroutine. await suspends the coroutine while waiting for I/O (doesn't block the thread - other coroutines run). asyncio.run() starts the event loop. asyncio.gather() runs coroutines concurrently. Use for: many concurrent I/O operations (HTTP, DB, websockets) with far fewer threads than traditional threading.

Permalink

Q6:

How do Python coroutines work?

Mid

Answer

Coroutines defined with async def.
Execution suspends at await points.
Useful for pipelines, cooperative multitasking, and event-driven systems.

Quick Summary: Coroutines are functions defined with async def that can suspend at await points. When a coroutine awaits something (asyncio.sleep, aiohttp request), control returns to the event loop which runs other coroutines. Unlike threads, coroutines switch cooperatively (at await) not preemptively. This enables thousands of concurrent operations with minimal memory compared to threads.

Permalink

Q7:

How do you implement concurrent futures for parallel tasks?

Mid

Answer

Use ThreadPoolExecutor or ProcessPoolExecutor.
submit() runs tasks asynchronously.
Retrieve results using result() or as_completed().

Quick Summary: concurrent.futures provides a high-level interface for parallel tasks. ThreadPoolExecutor for I/O-bound parallel tasks. ProcessPoolExecutor for CPU-bound parallel tasks (bypasses GIL with separate processes). Submit tasks with executor.submit(fn, args) returns Future. executor.map(fn, iterable) for parallel map. Use with context manager: with ThreadPoolExecutor(max_workers=4) as executor:.

Permalink

Q8:

How does Python handle sockets and networking?

Mid

Answer

socket module supports TCP/UDP networking.
Used for client-server communication.
asyncio supports asynchronous networking for high concurrency.

Quick Summary: Python socket programming: socket.socket(AF_INET, SOCK_STREAM) for TCP. bind(), listen(), accept() for server. connect() for client. send() and recv() for data transfer. Use selectors module for non-blocking I/O. asyncio has async socket support. High-level: use requests (HTTP), websockets library, or Twisted for network protocols. Raw sockets rarely needed - use appropriate high-level library.

Permalink

Q9:

How do you make HTTP requests in Python?

Mid

Answer

Use requests for synchronous HTTP calls.
Use aiohttp or httpx for asynchronous requests.
Supports headers, auth, JSON, and streaming.

Quick Summary: HTTP requests in Python: requests library (simplest, synchronous). response = requests.get(url, headers={}, params={}, timeout=5). POST: requests.post(url, json=data). Session for connection pooling. For async HTTP: aiohttp or httpx. Always set timeouts. Handle errors with response.raise_for_status(). Use sessions for multiple requests to the same host (connection reuse).

Permalink

Q10:

What are Python design patterns?

Mid

Answer

Patterns include Singleton, Factory, Observer, Strategy, Decorator.
Improve maintainability, structure, and scalability.

Quick Summary: Python design patterns: Singleton (one instance - use module-level variable or metaclass), Factory (create objects without specifying exact class), Observer (publish-subscribe with callbacks), Strategy (swap algorithms - pass functions), Decorator (functools.wraps), Command (encapsulate actions as objects). Python's duck typing and first-class functions make many patterns simpler than in Java/C++.

Permalink

Q11:

How do Python weak references work?

Mid

Answer

weakref allows referencing objects without preventing GC.
Useful for caching and avoiding memory leaks.

Quick Summary: Weak references (weakref module) hold references to objects without preventing garbage collection. When the referenced object has no strong references left, it's collected even if a weak reference exists. weakref.ref(obj) creates one. weakref.WeakValueDictionary caches objects without preventing GC. Use for: caches that should release memory under pressure, observer patterns without strong coupling.

Permalink

Q12:

How do you handle file and directory operations?

Mid

Answer

Use os and pathlib for creating, deleting, and navigating files/directories.
Use shutil for copy/move/archive operations.

Quick Summary: pathlib.Path is the modern way: Path("dir") / "file.txt" for path joining. path.exists(), path.is_file(), path.is_dir(). path.mkdir(parents=True, exist_ok=True). path.glob("*.txt") for file search. shutil.copy(), shutil.move(), shutil.rmtree() for file operations. os.walk() or Path.rglob() for directory traversal. Prefer pathlib over os.path for cleaner code.

Permalink

Q13:

How do you serialize and deserialize custom Python objects?

Mid

Answer

Use json.dumps() with custom default handlers.
pickle supports full object serialization.
Avoid untrusted pickle data for security.

Quick Summary: Custom object serialization: implement __getstate__ and __setstate__ for pickle (control what's pickled). For JSON: implement a custom JSONEncoder subclass with default() method, or convert to dict before json.dumps(). dataclasses.asdict() for dataclasses. marshmallow Schema for validation + serialization. attrs library with converters. Keep serialized format stable for versioned data.

Permalink

Q14:

How do you profile Python code?

Mid

Answer

Use cProfile, profile, or timeit.
Use line_profiler for detailed bottleneck analysis.
Optimize slow loops and hotspots.

Quick Summary: Python profiling: cProfile for production profiling (low overhead). line_profiler for line-by-line: decorate with @profile, run with kernprof -l. memory_profiler for memory: decorate with @profile, run with python -m memory_profiler. py-spy for sampling production processes (no code changes needed). Visualize cProfile results with snakeviz. Profile, find the bottleneck, optimize, measure again.

Permalink

Q15:

How do you implement caching in advanced scenarios?

Mid

Answer

Use lru_cache for memoization.
Use Redis or Memcached for distributed caching.
Support TTL and size-based eviction.

Quick Summary: Advanced caching: TTL cache (cachetools.TTLCache) expires entries after N seconds. LRU cache (functools.lru_cache, cachetools.LRUCache) evicts least-recently-used. Redis for distributed caching across processes/servers. Cache-aside: check cache, miss -> load from DB -> store in cache. Stampede prevention: probabilistic early expiry or Redis locking during cache population.

Permalink

Q16:

How do you handle logging in distributed Python applications?

Mid

Answer

Use structured JSON logging.
Forward logs to ELK, Fluentd, or centralized servers.
Include correlation IDs for traceability.

Quick Summary: Distributed logging: use structured logging (JSON format) so logs are parseable. Include trace/correlation IDs to link log entries across services. Centralize logs with ELK stack or cloud logging. Use logging.handlers.SysLogHandler to forward to syslog. python-json-logger library for JSON output. In async code, pass context via contextvars to ensure correlation IDs are included in all log entries.

Permalink

Q17:

How do you implement custom context managers?

Mid

Answer

Define __enter__ and __exit__ methods.
Or use contextlib for simplified managers.
Ensures proper cleanup of resources.

Quick Summary: Custom context managers: class-based: implement __enter__ and __exit__. __exit__ receives exception info (exc_type, exc_val, exc_tb) and can suppress exceptions by returning True. Function-based (simpler): use @contextlib.contextmanager decorator, yield once (code before yield = __enter__, after yield = __exit__). contextlib.ExitStack manages multiple context managers dynamically.

Permalink

Q18:

How do you implement async iterators and async generators?

Mid

Answer

Async generator uses async def with yield.
Async iterator defines __aiter__ and __anext__.
Useful for streaming async data.

Quick Summary: Async iterators implement __aiter__ and __anext__ (async def). Async generators use yield inside async def functions. Consume with async for. Use case: stream data from an async source (websocket, database cursor) without blocking. Example: async def stream_data(): async for row in db.cursor.fetch(): yield row. asyncio.Queue works well with async generators for producer-consumer.

Permalink

Q19:

What are the main libraries for data analysis in Python?

Mid

Answer

NumPy for numerical computation,
pandas for DataFrame-based manipulation,
SciPy for scientific computing,
Matplotlib and Seaborn for visualization.

Quick Summary: Python data analysis libraries: NumPy (numerical computing, arrays), pandas (data manipulation, DataFrames), SciPy (scientific computing), statsmodels (statistical models). Data viz: matplotlib, seaborn, plotly. Machine learning: scikit-learn, XGBoost, LightGBM. Deep learning: TensorFlow, PyTorch. Jupyter notebooks for interactive exploration. These form the Python data science ecosystem.

Permalink

Q20:

What is NumPy and why is it important?

Mid

Answer

NumPy provides multi-dimensional arrays and vectorized operations.
Allows fast computation and broadcasting.
Foundation for scientific and ML libraries.

Quick Summary: NumPy provides N-dimensional arrays (ndarray) with vectorized operations - much faster than Python lists. Operations apply to entire arrays without Python loops (C-level speed). Broadcasting: arrays with different shapes can operate together. Key operations: array creation, indexing/slicing, math operations, linear algebra (linalg), FFT, random numbers. Foundation for pandas, scikit-learn, and PyTorch.

Permalink

Q21:

What is pandas and how is it used?

Mid

Answer

pandas offers Series and DataFrame structures.
Supports filtering, grouping, merging, reshaping.
Ideal for cleaning and preprocessing data.

Quick Summary: pandas provides DataFrame (2D table) and Series (1D column) data structures. Key operations: read_csv/read_excel, select columns (df["col"]), filter rows (df[df.age > 18]), groupby(), merge() for SQL-like joins, pivot_table(), apply() for custom transformations, fillna() for missing data. Use vectorized operations (not loops) for performance. iloc for integer-based, loc for label-based indexing.

Permalink

Q22:

What are Python data visualization tools?

Mid

Answer

Matplotlib for low-level charts,
Seaborn for statistical visualizations,
Plotly/Bokeh for interactive plots.

Quick Summary: Python data viz: matplotlib (low-level, flexible, foundation of others). seaborn (statistical plots on top of matplotlib, beautiful defaults). plotly (interactive charts, works in Jupyter and web). Altair (declarative, grammar of graphics). bokeh (interactive web plots). In Jupyter: %matplotlib inline. For production dashboards: Dash (plotly) or Streamlit. Choose matplotlib for static, plotly for interactive.

Permalink

Q23:

How do you handle missing data in pandas?

Mid

Answer

Identify missing values with isnull/notnull.
Fill with mean/median/mode or custom values.
Drop rows or columns when appropriate.

Quick Summary: Handle missing data in pandas: detect with df.isna() or df.isnull(). Drop: df.dropna(axis=0) rows, dropna(axis=1) columns, dropna(subset=["col"]). Fill: df.fillna(value), fillna(df.mean()), fillna(method="ffill") forward fill, bfill backward fill. Interpolate: df.interpolate(). For ML: use SimpleImputer from scikit-learn to fill with mean/median/mode in pipelines.

Permalink

Q24:

How do you handle categorical data?

Mid

Answer

Convert categories to numeric using encoding.
One-hot for nominal, label encoding for ordinal.
Required for ML algorithms.

Quick Summary: Handle categorical data: pandas Categorical type for memory efficiency. One-hot encoding: pd.get_dummies(df["col"]) or scikit-learn OneHotEncoder. Label encoding: LabelEncoder for ordinal data with natural order. Ordinal encoding: OrdinalEncoder when order matters. Target encoding for high-cardinality categoricals. Always encode test data using fit from training data to prevent data leakage.

Permalink

Q25:

How do you normalize or standardize data?

Mid

Answer

Normalization scales values to 0–1.
Standardization gives mean 0 and std 1.
Ensures equal feature contribution.

Quick Summary: Normalization scales to [0,1]: (x - min)/(max - min) with MinMaxScaler. Standardization scales to mean=0, std=1: (x - mean)/std with StandardScaler. Use standardization for algorithms that assume normal distribution (linear regression, SVM, neural nets). Normalization for bounded ranges. Always fit scaler on training data only, then transform test data to prevent data leakage.

Permalink

Q26:

What is scikit-learn and why is it used?

Mid

Answer

scikit-learn provides ML algorithms,
preprocessing tools,
model evaluation and pipelines.

Quick Summary: scikit-learn provides consistent API for ML: fit(X_train, y_train) trains the model, predict(X_test) makes predictions, score() evaluates. Includes: preprocessing (scalers, encoders), feature selection, dimensionality reduction (PCA), classification, regression, clustering, cross-validation, pipelines. Consistent API makes switching algorithms easy for experimentation.

Permalink

Q27:

What are common machine learning algorithms in Python?

Mid

Answer

Supervised: Linear/Logistic Regression, SVM, Trees, RF.
Unsupervised: K-means, PCA.
Ensemble: Boosting, Bagging.

Quick Summary: Common ML algorithms in Python: Linear/Logistic Regression (baseline, interpretable). Decision Trees and Random Forest (ensemble, handles non-linearity). Gradient Boosting (XGBoost, LightGBM - usually best for tabular data). SVM for small-medium datasets. K-Means for clustering. Neural networks (PyTorch, TensorFlow) for images and NLP. Start simple, add complexity only when needed.

Permalink

Q28:

How do you split datasets for training and testing?

Mid

Answer

Divide data into training and test sets.
Optionally add validation set.
Prevents overfitting and checks generalization.

Quick Summary: Split data to evaluate model performance on unseen data. train_test_split(X, y, test_size=0.2, random_state=42) gives 80% train, 20% test. Stratified split for imbalanced classes (stratify=y). Never train on test data. Validation set for hyperparameter tuning (or use cross-validation). Test set is used only once at the very end to report final performance.

Permalink

Q29:

What are pipelines in machine learning?

Mid

Answer

Combine preprocessing and models into a workflow.
Ensures consistent transformations.
Improves reproducibility.

Quick Summary: Scikit-learn Pipeline chains preprocessing and model into one object: Pipeline(steps=[("scaler", StandardScaler()), ("model", LogisticRegression())]). Benefits: prevents data leakage (fit applies only to training data), easier cross-validation, model serialization includes preprocessing, clean code. Combine with GridSearchCV for hyperparameter tuning across all steps.

Permalink

Q30:

How do you evaluate machine learning models?

Mid

Answer

Classification: Accuracy, Precision, Recall, F1, AUC.
Regression: MSE, MAE, R2.
Used to compare and select models.

Quick Summary: Model evaluation metrics: Classification: accuracy, precision, recall, F1-score (use when classes imbalanced), ROC-AUC, confusion matrix. Regression: MAE (mean absolute error), MSE, RMSE, R-squared. Use cross_val_score for cross-validated scores. Classification report gives all classification metrics at once. Choose metric based on business impact of false positives vs false negatives.

Permalink

Q31:

How do you handle overfitting and underfitting?

Mid

Answer

Overfitting: Reduce complexity, regularization, cross-validation.
Underfitting: Increase model complexity or features.

Quick Summary: Overfitting: model learns training data too well, poor on new data. Underfitting: model too simple, poor on both. Fixes for overfitting: more training data, regularization (L1/L2), dropout (neural nets), simpler model, cross-validation. Fixes for underfitting: more features, more complex model, less regularization. Learning curves (train vs validation error vs dataset size) diagnose the problem.

Permalink

Q32:

How do you save and load machine learning models?

Mid

Answer

Use pickle or joblib for serialization.
Framework-specific save/load for deep learning.
Allows reuse without retraining.

Quick Summary: Save ML models: joblib.dump(model, "model.pkl") and joblib.load() for scikit-learn (handles NumPy arrays efficiently). pickle.dump() works too but joblib is preferred for large models. TensorFlow: model.save("model_dir") - SavedModel format. PyTorch: torch.save(model.state_dict(), "model.pth") and model.load_state_dict(torch.load()). For production: ONNX format for cross-framework deployment.

Permalink

Q33:

How do you implement feature selection?

Mid

Answer

Use correlation, recursive feature elimination,
or model-based selectors.
Improves performance and reduces dimensionality.

Quick Summary: Feature selection reduces noise and overfitting, speeds training. Methods: filter (correlation, chi-squared, mutual information - no model needed), wrapper (RFE - recursive feature elimination, trains model), embedded (L1 regularization - Lasso sets irrelevant feature weights to 0). SelectKBest, RFECV in scikit-learn. Feature importance from tree models. Start with filter methods for speed.

Permalink

Q34:

How do you handle time series data?

Mid

Answer

Use datetime indexing, resampling, rolling windows.
Model trends, seasonality.
Use pandas and statsmodels.

Quick Summary: Time series data handling: parse dates with pd.to_datetime(), set DatetimeIndex. Resample: df.resample("1H").mean() for hourly aggregation. Rolling statistics: df.rolling(7).mean() for 7-day moving average. Lag features: df["lag_1"] = df["value"].shift(1). Seasonality and trend decomposition: statsmodels.tsa.seasonal_decompose. Models: ARIMA, Prophet (Facebook), LSTM (deep learning).

Permalink

Q35:

How do you perform cross-validation?

Mid

Answer

Split data into multiple folds.
Train and test repeatedly.
Ensures robust model evaluation.

Quick Summary: Cross-validation evaluates model performance more reliably than single train/test split. KFold: split data into K folds, train on K-1, test on 1, repeat K times, average scores. StratifiedKFold for classification (preserves class distribution in each fold). cross_val_score(model, X, y, cv=5) does 5-fold CV. Use for: hyperparameter tuning (GridSearchCV uses CV internally), comparing models, reporting performance.

Permalink

Q36:

What are Python tools for NLP?

Mid

Answer

NLTK for tokenization and parsing.
spaCy for fast NLP pipelines.
Text preprocessing for ML.

Quick Summary: Python NLP tools: NLTK (classical NLP - tokenization, stemming, POS tagging, good for learning). spaCy (production NLP - fast, pre-trained models, named entity recognition). Hugging Face Transformers (BERT, GPT models - state of the art). Gensim (word embeddings - Word2Vec). TextBlob (simple sentiment analysis). Use spaCy for production, Transformers for deep learning NLP tasks.

Permalink

Q37:

How do you handle large datasets in Python?

Mid

Answer

Use chunking or lazy loading.
Use vectorized NumPy/pandas operations.
Use Dask or PySpark for distributed computing.

Quick Summary: Large dataset handling in Python: pandas chunking: pd.read_csv(file, chunksize=10000) reads in chunks. Dask (parallel pandas on multiple cores or clusters). Polars (Rust-based, much faster than pandas for large files). Vaex (lazy evaluation, out-of-core computation). Parquet format (columnar, compressed) instead of CSV. PySpark for distributed processing at very large scale.

Permalink

Q38:

How do you deploy Python ML models?

Mid

Answer

Expose models using Flask, FastAPI, or Django.
Containerize with Docker.
Use CI/CD and cloud platforms for production.

Quick Summary: Deploy Python ML models: Flask or FastAPI as REST API wrapper around the model. Serialize model with joblib/pickle, load on startup. Docker container for portability. Kubernetes for scaling. MLflow for experiment tracking and model registry. BentoML or Seldon for ML-specific serving with versioning. Monitor predictions in production (data drift, performance degradation). Batch prediction jobs for non-real-time use cases.

Permalink

Mid Python Interview Questions

Python Interview Questions & Answers

Questions

What are Python descriptors and how are they used?

Answer

What is __slots__ in Python and why use it?

Answer

Explain Python metaclasses and use cases.

Answer

How does Python handle threading and concurrency?

Answer

How do you implement asynchronous programming in Python?

Answer

How do Python coroutines work?

Answer

How do you implement concurrent futures for parallel tasks?

Answer

How does Python handle sockets and networking?

Answer

How do you make HTTP requests in Python?

Answer

What are Python design patterns?

Answer

How do Python weak references work?

Answer

How do you handle file and directory operations?

Answer

How do you serialize and deserialize custom Python objects?

Answer

How do you profile Python code?

Answer

How do you implement caching in advanced scenarios?

Answer

How do you handle logging in distributed Python applications?

Answer

How do you implement custom context managers?

Answer

How do you implement async iterators and async generators?

Answer

What are the main libraries for data analysis in Python?

Answer

What is NumPy and why is it important?

Answer

What is pandas and how is it used?

Answer

What are Python data visualization tools?

Answer

How do you handle missing data in pandas?

Answer

How do you handle categorical data?

Answer

How do you normalize or standardize data?

Answer

What is scikit-learn and why is it used?

Answer

What are common machine learning algorithms in Python?

Answer

How do you split datasets for training and testing?

Answer

What are pipelines in machine learning?

Answer

How do you evaluate machine learning models?

Answer

How do you handle overfitting and underfitting?

Answer

How do you save and load machine learning models?

Answer

How do you implement feature selection?

Answer

How do you handle time series data?

Answer

How do you perform cross-validation?

Answer

What are Python tools for NLP?

Answer

How do you handle large datasets in Python?

Answer

How do you deploy Python ML models?

Answer

Curated Sets for Python

What is slots in Python and why use it?